Lol/L.Li 1 iwiUA i_. — ,uvAKi 

NAVAL POSTGRADUATE SCHOOL 
MONTEREY, CALIFORNIA 93 “M3 



NAVAL POSTGRADUATE SCHOOL 

Monterey, California 




THESIS 



APPLICATION OF A SILICON COMPILER TO 
VLSI DESIGN OF 

DIGITAL PIPELINED MULTIPLIERS 

by 

Dennis J. Carlson 
June 1984 



Thesis Advisor: D. E. Kirk 

Approved for public release; distribution unlimited. 



SECURITY CLASSIFICATION QF THIS PAGE (When Date Entered) 



REPORT DOCUMENTATION PAGE 


: '-' READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1. REPORT NUMBER 


2. GOVT ACCESSION NO. 


3. RECIPIENT'S CATALOG NUMBER 


4. TITLE (and Subtitle) 

Application of a Silicon Compiler to 
VLSI Design of 

Digital Pipelined Multipliers 


5. TYPE OF REPORT 4 PERIOD COVERED 

Master's Thesis 
June 1984 


S. PERFORMING ORG. REPORT NUMBER 


7. AUTHORO) 

pennis J. Carlson 


8. CONTRACT OR GRANT NUMBERf*; 


9. PERFORMING ORGANIZATION NAME AND ADDRESS 

Naval Postgraduate School 
Monterey, California 93943 


10. PROGRAM ELEMENT, PROJECT. TASK 
AREA 4 WORK UNIT NUMBERS 


1 1. CONTROLLING OFFICE NAME AND ADORESS 

Naval Postgraduate School 
Monterey, California 93943 


12. REPORT DATE 

June 1984 


13. NUMBER OF PAGES 

141 


U. MONITORING AGENCY NAME 6 ADDRESSfi/ different from Controlling Office) 


15. SECURITY CLASS, (ol thle report) 

Unclassified . ... 


I5«. DECLASSIFICATION/ DOWNGRADING 
SCHEDULE 



16. DISTRIBUTION STATEMENT (of this Report) 



Approved for public release; distribution unlimited. 



17. DISTRIBUTION STATEMENT (of the abstract entered in Btock 20, if different from Report) 



18. SUPPLEMENTARY NOTES 



19. KEY WORDS (Continue on reverse side If neceeemry end identity by block number) 

VLSI Design, MacPitts, Pipelined Multipliers, Silicon Compiler, 
CAD Tools 



20. ABSTRACT (Continue on reverse side if neceeemry end Identity by btock number) 

The concept and application of silicon compilers is described. 

The process of employing the MacPitts silicon compiler to design 
an 8-bit pipelined digital multiplier is presented, and the re- 
sulting design is evaluated. The process of installing and de- 
bugging the MacPitts compiler and the Caesar VLSI graphics editor 
on the VAX 11/780 computing facilities at NPS is documented in 
appendices. 



DO I jan*73 1473 EDITION OF I NOV «S IS OBSOLETE 

S/N 0102 - LF- 014 - 6601 1 



SECURITY CLASSIFICATION OF THIS PACE (Whin Date Entered) 




Approved for public release; distribution unlimited. 



Application of a Silicon Compiler to 
VLSI Design of 

Digital Pipelined Multipliers 



by ' 



Dennis J. Carlson 

lieutenant Comman der," United States Navy 
E.S., Rensselaer Polytechnic Institute, 1969 



Submitted in partial fulfillment of the 
requirements for the degree of 



EASIER OF SCIENCE IN ELECTRICAL ENGINEERING 



from the 

NAVAL POSTGRADUATE SCHOOL 
June 1984 



ABSTRACT 



The concept and application of silicon compilers is 
described. The process of employing the MacPitts silicon 
compiler to design an 8-bit pipelined digital multiplier is 
presented, and the resulting design is evaluated. The 
process of installing and debugging the MacPitts Compiler 
and the Caesar VLSI graphics editor on the VAX- 11/780 
computing facilities at NPS is documented in appendices. 



3 



TABLE OF CONTENTS 



I. INTRCDU CT ION 11 

A. BACKGROUND 11 

E. CURRENT RESEARCH GOALS 12 

II. AEPRCACHES TO SILICON COMPILATION 15 

A. VLSI DESIGN ACTIVITIES DOMAIN 15 

E. EVALUATION CAVEAT 17 

C. LIMITED SEECTRUM COMPILERS (TRANSLATORS) . . 18 

E. BRO A E- SPECTRUM SILICON COMPILERS 19 

1. Floor Planners 19 

2. Behavioral S pecif ication Compilers .... 21 

III. USING MACPITTS 30 

A. THE INPUT FILE 30 

1. Fundamentals of the MacPitts Language . . 30 

2. Two Multiplier Examples 36 

B. INVOCATION OPTIONS 42 

C. USE OF TEE MACPITTS INTERPRETER 46 

E. EVOLUTION OF THE 8 BIT PIPELINED 

MULTIPLIER 49 

1. Design Motivation and Constraints .... 49 

2. First Design: 3 Stages, 8 Bits on One 

Chip 50 

3. First Eartit ioning: 2 Bits, 1 Stage 

Pipeline 51 

4. Second Partitioning: 4 Bits, 2 Stage 

Pipeline 54 

5. Third Partitioning: 2 Bits, 4 Stage 

Pipeline 57 



4 



E. DESIGN VALIDATION 60 

1. Functional Simulation 60 

2. Design Rule Checking 63 

3. Node Extraction and Event Simulation ... 66 

F. SUMMARY CE ACTIVITIES IN THE MACPITTS 

DESIGN CYCLE 68 

IV. MACPITTS PERFORMANCE 72 

A. LAYOUT ERRORS AND INEFFICIENCIES 72 

1. Inefficiencies 72 

2. Errors 75 

E. ORGANELLES VS. STANDARD CELLS 77 

C. SOFTWARE INCOMPATIBILITIES 78 

V. CONCLUSION 79 

A. SUMMARY 79 

E. RECCMMENIATIONS 80 

APPENDIX A: INSTALLATION OF MACPITTS ON VAX-11/780 

UNDER UNIX 4.1 AND 4.2 81 

A. INSTALLATION UNDER UNIX 4.1 OPERATING 

SYSTEM 81 

E. INSTALLATION UNDER UNIX 4.2 OPERATING 

SYSTEM 85 

APPENDIX E: INSTALLATION OF THE CAESAR VLSI EDITOR 

UNDER UNIX 4.1 AND 4.2 88 

A. INSTALLATION UNDER UNIX 4.1 88 

E. INSTALLATION UNDER THE UNIX 4.2 OPERATING 

SYSTEM 89 

APPENDIX C: MANUAL PAGES FOR EERKELEY DESIGN TOOLS ... 91 

APPENDIX D: SIMULATION RESULTS FOR MULTIP8C 

MULTIPLIER 122 



5 



APPENEIX E: LAYOUT PHOTOGRAPHS 



131 



LIST Of REFERENCES 137 

EIBLICGRAPH Y 139 

INITIAL CIS IRIBUTION IIST 140 



6 



LIST OF TABLES 



I. Statistics For MacPitts Multiplier Chip 

Designs '..61 

II. MacPitts Source Files 82 



7 



LIST OF FIGURES 

2.1 VISI Design Activities Spectrum 16 

2.2 Typical floor plan produced by the 

F.I.R.S.T. Silicon Compiler 20 

2.3 Flcor Plan cf the MacPitts Target 

Architecture 23 

2.4 MacPitts Register Circuit and Timing Diagram . . 25 

2.5 MacPitts Program Data Flow 27 

3.1 Multic.mac Source File 37 

3.2 Example of the Multic Behavioral 

Specification 40 

3.3 Multip.mac Source file 41 

3.4 Compiler Statistics for multip 43 

3.5 A MacPitts Interpreter Session for multip ... 48 

3.6 Multip8.mac Source File 52 

3.7 Multip8.mac Source File (Continued) 53 

3.8 Use of Ports and Registers in multip8.mac ... 54 

3.9 Data Path Architecture of Multip8 Chip 55 

3.10 Block Diagram of First Partitioning 56 

3.11 Multip8a.mac Source File 57 

3.12 Multip8 1. mac Source File 58 

3.13 Multip8c.mac Source File 59 

3.14 Values: Program to Compute Multip8c Output . . 64 

3.15 Mextra .log File for Mul8c.cif 66 

3.16 Two Macro Driver Files for Event Simulation . . 67 

4.1 Data Path Ouput Routing 74 

D. 1 Macpitts Interpreter Results 122 

D.2 MacPitts Interpreter Results, (continued) . . 123 

D.3 MacPitts Interpreter Results, (Continued) . . 124 



8 



D. 4 MacPitts Interpreter Eesults, (Continued) . . 125 

E. 5 MacPitts Interpreter Eesults, (Continued) . . 126 

E. 6 MacPitts Interpreter Eesults, (Continued) . . 127 

E.7 Event Simulation Eesults 128 

B.8 Event Simulation Results, (Continued) .... 129 

E.9 Event Simulation Eesults, (Continued) .... 130 

E. 1 multic (top), multip (bot) 132 

E.2 multip8 (top), multip8a (bot) 133 

E. 3 multip8b (tcp) , multip8c5 (bot) 134 

E. 4 multip8c4 (top), multip8c4d (bot) 135 

E.5 layout Errors in kchip2 136 



9 



ACKNOWLEDGEMENTS 



I would like to thank th€ following individuals fcr 
their assistance in the completion of this thesis: 

Naval Postgraduate School 
Dr. Donald Kirk 
Prcf. Robert Strum 
Dr. Herschel Loomis 
Mr. Al Wong 

Massachusett s Institute of Technology 

Lincoln Laboratory 

Mr. Kenneth K. Crouch 
Dr. Antun Domic 

University of California at Berkeley 
Dr. John K. Custerhout 
Dr. Keith Sklower 

Stanford University 
Dr. Robert Mathews 
Ms. Susan Taylor 

University of Kansas 
Dr. Gerry L. Kelly 



10 



I. INTRODUCTION 



A. BACKGROUND 

The initial work done on the design of very large scale 
integrated circuits (VLSI) at the Naval Postgraduate School 
(NPS) used a set of software tools which require designer 
interaction at all levels of the design process. These 
tools and their use is described in a recent thesis by 
Conradi and Hauenstein [Ref. 1]. 

Their design approach centers around the use of: (1) 

machine-generated programmable logic arrays (PLA's) speci- 
fied in a language which translates boolean equations into 
circuit layouts, and (2) a library of standard cell layouts 
from which other reguired circuit primitives are selected. 
The designer arranges the PLA f s and standard cells cn a 
"floorplan" designed by heuristic methods, and interconnects 
them with a network of individual wires devised by the 
designer and encoded as a "wirelist. " The floorplan layout 
and addition of interconnecting wires must be done manually, 
typically on graph paper at the drawing board. The results 
are manually encoded in an input file format readable by a 
layout language program ("ell” in the case of the cited 
research) which merges the designer’s floorplan and wirelist 
with: (1) the selected library cell layout descriptions and 

(2) the PLA layout descriptions produced by the separate PLA 
generation program. The circuit layout program then 
produces a description of the total design in another stan- 
dard file interchange format, the Caltech Intermediate Form, 
(CIF) described by Mead and Conway [Ref. 2: pp. 115-127]. 

The CIF file can then be used as a source for extracting 
design validation information, as well for producing the 
photographic masks used for circuit fabrication. 
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The design process outlined has the advantage of giving 
the designer thorough control over the architecture of the 
circuit. The human ability to evaluate alternatives, recog- 
nize patterns and grasp complex multi-dimensional relation- 
ships between individual elements and the whole design 
exceeds that of any current machine algorithm. 

Cn the ether hand, this process absorbs large amounts of 
the designer's time in performing the drudgery of planning 
and encoding the layout details. There are at least four 
things wrong with involving the designer at this level: 

(1) It is repetitious work, and therefore error-prone. 

(2) It is slow. (Southard [Ref. 3] and others have noted 
that design costs far outweigh production costs for custom 
VLSI.) 

(3) Preoccupation with mechanical details restricts a 
designer's freedom to explore high-level architectural 
issues such as bus structure, degree of pipelining, and 
speed-complexity tradeoffs. 

(4) Major modifications to the layout are very expensive to 
make if they come late in the design cycle, i.e. after cell 
interconnection. 

£• CUBBIHT RESEARCH GOALS 

With this background for motivation, it was decided to 
investigate additional VLSI computer-aided design tocls 
which would reduce time- to- design, minimize the occurrence 
of human error in layout, and make it possible to explore 
design alternatives with greater ease. 

The major tool available in the VLSI research community 
for this purpose is MacPitts. MacPitts (the name is derived 
from twe early researchers, McCulloch and Pitts who studied 
neurological systems from a mathematical and logic stand- 
point) is a silicon compiler developed at the Massachusetts 
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Institute of Technology’s Lincoln Laboratories in 1981-1982 
£Bef. 4]- A silicon compiler, according to one recent defi- 
nition [Bef. 5] which captures current usage of this often 
misunderstood term, is "a program that, given a description 
of what a circuit is supposed to do, will produce a chip 
layout that implements that function in silicon." There is 
enough latitude to allow fundamentally different approaches 
to silicon compilation to coexist under this definition, as 
will he demonstrated in the following chapter. In any case, 
however, the term compiler is apt. Like software compilers, 
these programs take high-level source code descriptions 
which are human- readable (and perhaps, but not necessarily, 
algorithmic) and "convert" them into low-level object code 
(a CIT file) which is directly readable by a machine. In 
the case of a silicon compiler, however, the machine is not 
a general-purpose computer, but a photo-resist mask gener- 
ator at a silicon foundry facility that fabricates inte- 
grated circuits. 

Another function that the most advanced silicon 
compilers perform is resource allocation. Software 

compilers free the programmer from making decisions on where 
in available memory space to store a particular machine code 
word. Silicon compilers, at their best, free the designer 
from deciding where cn available silicon area to place a 
particular circuit element. Resource allocation is a one- 
dimensional job in software compilers, but a two-dimensional 
job in silicon compilers. The constraints on efficient 
resource allocation in silicon are severe — compactness is 
almost always one goal, as is speed of operation (minimum 
propagation delay.) In memo ry allocation, compactness is not 
essential, unless one is using a sequential access memory. 

Installation of MacPitts on the NPS VAX- 11/780 computer 
facility was expected to be a "turn-key" operation. This 
was in fact not the case. A large amount of effort was 



13 



spent in researching and performing the modifications to the 
host computer environment which enable it to run the 
MacPitts system, as well as in troubleshooting the distrib- 
uted MacPitts source code itself. The installation process 
is described in Appendix A. 

MacPitts has no progressive breakpoint facilities to 
allow a designer freedom to observe or alter the layout 
process at any point during execution. Once invoked, 
MacPitts produces a final interconnected layout, complete 
with bending pads, or no layout at all. Therefore, it was 
considered worthwhile to implement the color graphics 
editor, Caesar, designed by John Ousterhout at the 
University of California at Berkeley [fief. 6]. This tool 
allows the chip layout to be examined in detail on a color 
CRT monitor, and permits editing of the layout. Caesar 
represents the layout internally as a hierarchy of cells, 
which yields insight into the ways that MacPitts partitions 
the layout process. 

The installation of Caesar, while not as difficult as 
MacPitts, involved setting some site-dependent parameters as 
well as finding and correcting a bug in the distributed 
source code. These activities are described in Appendix B. 
Appendix C contains a copy of the on-line manual pages for 
Caesar and ether Berkeley tools used in this research. 
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II. APPROACHES TO SILICON COMPI LATION 



A. 11S1 DESIGN ACTIVITIES DOMAIN 

When trying to understand how silicon compilers work it 
is instructive to think of two design problems in the order 
in which they must be attacked. The first is translation of 
a brief behavioral or functional description into a mere 
precise intermediate description that is still independent 
of the specific implementation technology. The second is 
the automatic generation of a chip . layout in a target semi- 
conductor medium, using the intermediate description as a 
guide. It is important to separate the second activity from 
the first when one is designing a silicon compiler because 
of the speed at which the target semiconductor technologies 
are evolving. That is, complementary metal oxide semicon- 
ductor (CMOS) processes are rapidly overtaking N-channel 
metal oxide semiconductor (NMOS) processes. Multiple-layer 
metalization is also becoming more common, and minimum 
circuit feature sizes are shrinking as better control over 
the manufacturing processes is achieved. Computer architec- 
tures and functions evolve more slowly, by comparison. 

These two problems may be further subdivided. Werner 
[Ref. 7] has contributed the idea that a spectrum of VLSI 
design activities exists with corresponding media for the 
exchange of information by the computer-aided design tools 
employed at each band in the spectrum. (See figure 2.1.) 
Silicon compilers try to span the whole spectrum, an ambi- 
tious undertaking. 
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(Conventional) Automatic Layout Tools (Placement and Routing) 
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Figure 2-1 VLSI Design Activities Spectrum [Ref- 7]- 
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EVALUATION CAVEAT 



It should be recognized that all silicon compilers 
designed to date have to some extent traded performance of 
the ultimate VLSI design (as measured by operating speed and 
area efficiency) for reduced design time for the chip (and 
for the silicon compiler itself.) Gross [Ref. 8] quotes 
estimates for reduced design costs (time) by use cf bread 
spectrum silicon compilers to be a factor of 20. Eut 
Wallich, in a recent survey of silicon compiler efforts 
[Ref. 5], states that designs produced by silicon compilers 
available today tend to range from 1 5 to 200 percent larger 
than equivalent hand-crafted designs. 

Still/ silicon compilers have been misunderstood by 
researchers as noted by Gross. Some, without fully under- 
standing the dimensionality of the VLSI design process, 
believe that the design problem can be almost completely 
solved by the application cf current software methods and 
tools. Others, seeing the obvious limitations of contempo- 
rary silicon compilers and not grasping the potential 
contributions to VLSI from computer science technclogy 
transfer, believe that efficient VLSI designs will always be 
essentially manual. Murphy of Bell Laboratories, quoted by 
Werner [Bef. 7], states that ’’total automation is 
inappropriate — either now or in the foreseeable future — in 
anything where you have a competitive need for performance.” 
Nevertheless, Bell labs is conducting research of its own 
into silicon compilers. Their ”Plex” project reported in a 
more recent paper [Bef. 9 ] produces layouts of microcom- 
puters given, as input, the progra m (in assembly or C 
language) that the microcomputer is to execute. 

According to Wallich, the ultimate silicon compiler, now 
just a dream, will not only be able to take a behavioral 
description and produce a geometrical description of the 
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chip suitable for input to a mask making machine, tut will 
do sc for any kind of chip — microprocessor, signal 

processor, or even analog-digital hybrid for which the 
design rules are far more complex. The subtle process of 
architectural optimization (i. e. selecting a best floor 
plan frcm the myriad possibilities,) which occurs in the 
middle of the design activities spectrum, has so far not 
been captured in an algorithm. To achieve some breadth 
without being overwhelmed by complexity, silicon compilers 
have tended to contain built-in assumptions about a "target 
architecture." They are optimized for producing a certain 
class of circuits — mostly microprocessors — and produce 

layouts cf reasonable area and speed only for applications 
best suited to their target architecture. 

C. LIHITED SPECTBUH COMPILERS (TBA1JSLAT0RS) 

For completeness, it is necessary to mention these VLSI 
design tools in current use which fall short of covering the 
design spectrum. Thej are: 

• Eandcm logic/Standard-cell place-and-route systems, 

• Module compilers to implement boolean logic, including: 

• Gate array compilers, 

• PIA generators, 

• Regular expression compilers for 

finite-state machines, 

• layout languages, 

• Interactive graphical layout editors. 
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D. EBOAI-SBECTBUfl SIIICON COMPILERS 
1 • Floor Pl anne rs 

a. Common Properties 

The first broad spectrum translators of interest 
are the fleer planners. They all employ a structural speci- 
fication language in which the specification always corre- 
sponds extremely closely to a description of the designer’s 
mental model of how the chip should be laid out. They 
produce, as an initial output, a skeleton of the layout 
similar to an architect’s floor plan. Subsequently, floor 
planners fill the ’’rooms" with cells from a standard 
library. Some floor planners, of which Johannsen’s Bristle 
Blocks is a pioneering example [Ref. 10], can linearly 
stretch cells to match up the interconnec tions of abutting 
cells (sc-called ’’pitch matching.’’) 



b. F.I . B. S. 1. 



The current state of the art in floor planners 
is represented by the F.I.R.S.T. (Fast Implementation of 
Beal-Time Signal Transforms) silicon compiler developed at 
Edinburgh University £Bef. 11]- The F.I.R.S.T. compiler 
produces layouts of digital signal processing systems imple- 
mented as hard-wired networks of pipelined bit-serial opera- 
tors. The floor plan of F.I.R.S.T. chips (see figure 2.2) 
consists of a central wiring channel with operators arranged 
as function blocks around the ’’waterfront.’’ Each bit-serial 
operator is implemented as a separate function block which 
in turn is assembled from a library of hand-designed cells. 
The function blocks are arranged, in the order of their 
high-level specification by the designer, in two rows along 
either side of the wiring channel which accommodates all 
interconnections between the blocks. This uncomplicated and 
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Figure 2.2 Typical floor plan produced by the 
F. I miimSm T. Silicon Coipiler. 



novel layout methodology results in the non-use of about 20% 
of the total chip area (because the blocks may have varied 
heights.) At present, F.I.R.S.T. supports only the 
N-channel metal oxide semiconductor (NMOS) technology. 

The F.I.R.S.T. software consists of a small 
suite cf programs which provides the designer with a 
complete specialized design environment. At the top level 
is a language compiler that accepts a structural description 
of the circuit in terms of a net list of bit-serial opera- 
tors. The F.I.R.S.T. system contains a library of primitive 
operators, (such as MULTIPLY, ADD, SORT, BIT DELAY, ETC.) as 
well as a number of more complex procedural definitions 
(such as Biguad, Lattice, Butterfly, etc.) that enable a 



20 



range of signal processing architectures. The language 
compiler produces an intermediate level format file as 
output. This file is used by both a layout program, which 
produces the mask gecmetry, and a simulator. The simulator 
is event driven, which means that the voltage values on 
circuit nodes are modeled as discrete bits of data occurring 
at discrete time intervals. The functioning of individual 
operators is simulated on a word-by-word basis in response 
to a file of input commands. It is asserted that the simu- 
lator has the ability to uncover timing bugs in the data 
stream. 

A unigue and useful aspect of F.I.R.S.T. is 
incorporation of a translator program to convert the simula- 
tor* s output into a form suitable for use with an automatic 
test pattern generator system. 

2. Eehavio ral Specification Compilers 
a. Common Properties 

In contrast to the floor planners, which accept 
struc tural specifications at the top level, are the b ehav - 
ioral specification compilers, which do not reguire the 
designer to possess a prior mental model of the architecture 
to be designed. These systems attempt to translate a high- 
level behavioral description of the circuit into a geometric 
mask description. This step is a significant one over floor 
planners. 



b. Ayres* Work 

Ayres is the first to have written a book-length 
treatment of silicon compilation [Ref. 12]. Ayres* compiler 
approach starts with a synchronous logic specification of 
the chip behavior. Then follows a decomposition of this 
specif icaticn repeatedly into a hierarchy of implementing 
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NMOS FLA’s which become successively more area-eff iciest as 
they become smaller. The system includes heuristics to 
manage and optimize cn-chip routing among the PLA’s gener- 
ated. Ayres* compiler is potentially applicable to a 
broader class of circuits than F.I.R. S.T., but is still not 
efficient fcr a general range of problems. The scope of 
applications was restricted intentionally to control 
complexity. The very use of PLA’s as the sole basic 
buildirg block: restricts the area efficiency of this system. 
Even though the PLA’s themselves become more area-efficient 
as they become smaller, the difficulty of managing their 
interconnections limits the ultimate, efficiency of the 
layout. 

c. MacPitts 

MacPitts is the only broad spectrum silicon 
compiler with which this author has had any first-hand 
experience. It is also the most widely known and most ambi- 
tious behavioral specification compiler in operation. 

The hardware specif ication generated by MacPitts 
is in the form of an KMOS technology CIF file. To cope with 
the complexity of this project the designers restricted the 
target architectures to microprocessors consisting of a data 
path and a controller (see figure 2.3.) Other restrictions 
include fixing the width of the data path to one value 
throughout the design, and reguiring the designer to specify 
control and parallelism explicitly. The latter is not actu- 
ally a restriction in one sense, however, because it affords 
greater generality in designs. Except for making pin 
assignments, the MacPitts user has no explicit control over 
the floor plan of his design. The MacPitts target architec- 
ture results in the same basic floor plan for all designs, 
although this particular architecture is applicable to a 
greater variety of digital problems than any other scheme 
presently available. 
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Figure 2,3 Floor Plan of the MacPitts Target Architecture. 

The data path portion of the layout consists of 
a rectangular array of units called "organelles . " An orga- 
nelle is a tit-wise functional unit. A standard litrary of 
functions — adder, subtracter, shifters, incrementers, compa- 
rators, etc. — is provided. Also, if the algorithmic 
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behavior specification calls for conditional data flew or 
looping, the data path may also include multiplexers which 
have connections for control signals. This multiplexer 
organelle is not a litrary cell but is built into MacPitts. 
Data storage registers, implemented as master-slave flip- 
flops, are also "built-in organelles.” These are instanti- 
ated in the data path if their use is implied by the 
algorithmic specification. 

The vertical dimension of the data path outline 
in figure 2.3 corresponds to the number of bits in the data 
word. Longer word-lengths produce a taller chip. The 
various organelles are cascaded along the horizontal dimen- 
sion of the data path outline. 

The control portion of the layout acts on 
various signals, either derived from the data path or 
outside the chip, ard implements whatever boolean logic is 
necessary (as inferred from the algorithmic specification) 
to generate controls signals to drive the multiplexers in 
the data path. The result is an implementation of a finite 

state machine, (FSM) as described in Mead and Conway 
[Bef. 2]. The control unit does not use PLA's, but rather 
structural NOR gate arrays called "Weinberger Arrays” which 
can implement arbitrary combinational logic functions. 
Weinberger [Bef. 15] demonstrates that his logic arrays have 
three features which contribute to efficiency in an auto- 
mated circuit layout scheme. 

• They simplify the formation of interconnection patterns 
within the framework of a standardized layout. 

• They significantly reduce the required area (by elimi- 
nating unused inputs and separate interconnection 
areas. ) 

• They eliminate crossing of signal nets (by using single 
level wiring.) 
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State tiffing is controlled not by a two-phase 
non-c verlapping clock, which is somewhat standard in NMOS 
VLSI, hut by a three-phase clock which drives the register 
circuit shown in figure 2.4. This clocking scheme appar- 
ently allows a more compact layout of the register orga- 
nelle, but requires an extra pin in the package. 




Output 



tl 12 t3 t4 t5 



phi a 



phib 



phic 



Output 




tl = static storage 
t2 = isolate output 
t3 = sample input 
t4 = isolate input 
t5 = connect to output 



Figure 2.4 HacPitts Register Circuit and Timing Diagram. 
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One of the authors ox MacPitts, Siskind guoted 
in [Bef. 7], admits that optimizing chip performance was not 
a primary design gcal. Circuit densities reported were 
80-100 transistors per square millimeter in 5 micron feature 
size KUOS — approximately 2 orders of magnitude lower than 
the state cf the art layouts reported in Gross [Eef- 8]. 
Southard contends that the cells he helped design for 
MacPitts could fairly easily have been made 20 per cent 
smaller than they are [Ref. 5]. 

MacPitts cnly produces NMOS output in CIF, tut 
the user has a choice of either 4 or 5 micron minimum 
feature size, which the compiler handles by linearly scaling 
all features except the pads. The latter are contained in 

two separate libraries for 4 micron and 5 micron designs. 

From the programming viewpoint, MacPitts is a 
very complex system. It consists of a binary executable 
module of ever 1.5 megabytes which was built up as a LISP 
programming environment and then dumped, as described in the 
Franz Lisp manual £Bef. 13]. A synopsis of the functional 
elements which make up this LISP environment is shown in 
figure 2.5 . Unlike F.I.R.S.T., these programs (except the 
functional simulator or "interpr eter " as its authors call 
it) are not individually accessible. MacPitts runs automat- 
ically from beginning to end with no possibility for oper- 
ator intervention. The only control available at the 
console when the compiler is running is the standard UNIX 
system abort signal. 

The authors of MacPitts were careful to separate 
all the processing into technology independent (frent-end) 
and technology dependent (back-end) portions, with the 
intermediate-level description being the point of division. 
This inter media te-le vel description is available to the user 
as an "object file" in human readable form. It is possible, 
although net very practical, to write an object file 
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Figure 2.5 MacPitts Program Data Flow. 

directly for input to the back end of MacPitts. The object 
file is a long list containing 5 elements, each element 

being itself a list. The 5 elements are: definitions, 

flags, data path, control, and pins. This list is, of 
course, in a form readable by the layout programs. 

The layout programs produce only NMOS tech- 
nology. As mentioned above, two bonding pad libraries are 
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included: the Stanford standard cell library pads for 5 

micron designs, and the MOSIS ARPA community pads for 4 
micron designs. The "layout language" and CIF generation 
program, 15, which is embedded in MacPitts, was written 
especially for the project by Crouch £Ref. 14 ]. It has 
built-in facilities to handle both NMOS and CMOS technology 
layouts. Therefore, expanding MacPitts to produce CMOS CIF 
would not entail a complet e rewrite of the back end 
programs. 

An important feature of the MacPitts software is 
the functional simulator or interpreter. A MacPitts program 
is not only an IC specification, it is also an algorithmic 
specif icat ion. The interpreter executes the specification 

program as a general-purpose computer using an interactive, 
screen- oriented input/output style. By invoking this option 
of MacPitts the user can exercise his design, thereby vali- 
dating (to whatever extent the exercise is complete) its 
functional fidelity. Once the functional simulation is dene 
to satisfaction, MacPitts can be restarted without setting 
the interpreter option. This produces a finished layout and 
corresponding CIF file. By using the same language to drive 
both the interpreter and the integrated circuit compiler, 
human error is reduced. 

MacPitts lacks some features. It has none of 
the capabilities of P.I.R.S.T. to produce a test pattern to 
exercise the chip. It also lacks any built-in mechanism to 
identify worst-case path delays or to predict the maximum 
clock freguency of the finished chip. It does keep account 
of conductivity infermation, however, which it uses to 
predict chip power consumption. 

MacPitts uses a "correct by construction" 
doctrine in the layout process. By denying the user the 
means to specify the layout details of the chip, this 
approach also denies the user the opportunity to commit 
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design rule errors or to translate the specification program 
into a non-corresponding layout. But can MacPitts itself 
make design rule errors? 

The following chapters examine how to use 
MacPitts to produce an integrated circuit layout, hew to 
validate the design, and where to look for ways to improve 
chip performance. 
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III. USING MACPITTS 



A. IEE INPUT PILE 

1. Fun dam entals of the Mac P itts Language 

"MacPitts," the system for generating a custom inte- 
grated circuit , is also " MacPitts,” the language in which 
the algorithm is specified. In this section the second 

meaning is the one implied. All of the information which 

specifies what functional behavior is. required of a VLSI 
circuit is communicated to MacPitts in a single text file. 
Ihis file, which must have the extension ".mac", is written 
using syntax which closely resembles that of the LISP 
programming language. Because the MacPitts compiler is 
implemented in LISP, it is reasonable to expect the syntax 
of the MacPitts design language to follow the LISP paren- 
thesized notation. Ihis choice was made by the authors of 
MacPitts because it eliminates the need for a separate 
parser. 

LISP is a list processing language. Its data 
elements are "symbolic expressions" made up of "atoms" 
(fundamental word- like objects separated by spaces) , lists 
of atoms, lists of lists of atoms and so on. One of the 
strengths of LISP is the ability to concatenate atoms or 
lists into new lists, and to perform other operations on a 
list or a hierarchy of lists to produce new lists modified 
in useful ways. LISP has many built in functional defini- 
tions which are an "environment" of specifications for the 
operations to be performed on lists. These definitions are 
all contained in The Franz Lis p Ma nua l [Ref. .13]. In addi- 
tion to using these definitions, the LISP user is free to 
extend the LISP environment by defining new functions which 
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specify ether operations on lists. The types of operations 
may he simple manipulations of the atoms by partitioning or 
permutation, or, if the atoms which comprise the list happen 
to be numbers, arithmetic operations may be performed. The 
definitions of the operations themselves may also be assem- 
bled from lists cf mere primitive operational atoms. This 
functional extension of operations is what the authors of 
MacPitts have done in creating the MacPitts Lisp 
environment. 

The design of a VLSI circuit can be thought of as a 
list-tuilding process in which the lists are electrical 
ports, registers, interconnection nets,, data testing opera- 
tions, and ultimately a string of words which define a 
unique patterning of silicon in the mask level descriptive 
language, CIF. These lists are built according to rules 
contained in another list — the algorithmic specification 
source file. Although the MacPitts design language resem- 
bles LISP syntactically, its semantics is different and much 
more limited. A powerful feature of LISP is, for example, 
recursive definition. This feature is absent in the 
MacPitts design language. A description of the MacPitts 
grammar in Eackus normal form is given in [Ref. 4]. 

In its most general form, a MacPitts ’’program" to 
specify a circuit’s behavior consists of a set of 

"processes,” each of which executes sequentially, but all of 
which run in parallel. The states of each process are 
fundamentally disjoint from those of the other processes. 
This allows the hardware for each process to run indepen- 
dently of the other processes, if desired, and concurrently 
with the states of the other processes, in any case. The 
operations performed by a given process in a given state are 
specified by a ’’form.” Each form corresponds to a single 
machine state, and is executed in one clock cycle. A state 
may be given a name by preceding the form with a label. 
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Normally execution proceeds sequentially from one state to 
the following state in the .mac file at each clock cycle. A 
"go” form can he used, however, to deviate from this sequen- 
tial flew by causing the named state to be executed next 
instead cf the syntactically following state. 

Eata is communicated between the data path and the 
external world through "ports” which have the same bit width 
as the data path. Only a single data path width definition 
is allowed per program. A port may be declared "input," 
"output," "tri-state output," or "i/o. " Ports may also be 
declared as "internal," in which case they simply cascade 
the output of one data path operation to the input of 
another. The data path may also be specified to contain 
registers. The difference between internal ports and regis- 
ters is that registers can store data indefinitely after it 
has been clocked in, whereas ports are only electrical nodes 
in the data path and therefore do not store data. Ports 
simply are arrays of named terminals for conducting data 
from cne point to another. 

Control of operations performed on the data by the 
data path organelles is governed by the Weinberger array 
control unit. Control outputs from the control unit to the 
data path may determine, by means of their control over 
multiplexer organelles within the data path, which opera- 
tions occurring within the data path will affect downstream 
organelles. Status outputs from the data path returning to 
the control unit allcw the sequence of operations performed 
by the control unit to vary depending on the data present 
either in the registers or at any other point in the data 
path. The control unit functions may also be made to depend 
upon external inputs. The control unit communicates with 
the outside world using "signals," which are analogous to 
the "ports" used by the data path except that each signal 
appears on a single wire. Signals may be declared as 
"input," "output," "tri-state output," "i/o" or "internal." 
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Operations performed by the data path during a given 
state are specified by the IISP "sets" form. The setg 
causes the data path to evaluate a sequence of operations on 
either input port data, internal port data or register data. 
(The setg may also be used with signals.) The result of 
these specified operations is then conducted to another 
named port or loaded into a data path register during the 
next clock cycle. The compiler includes enough copies of 
each operator in the data path so that separate processes, 
intended to run in parallel, do not conflict over the 
attempted shared use of a single resource. The data path 
can cascade several operations together in a single form. 
This allows forms such as the following example, which 
computes a = b-c using 2*s complement arithmetic, to execute 
in one clock cycle: 

(setg a (+ b (1+ (not c) ) ) . 

The list consisting of everything on the preceding line is a 
single form. There are three operators in this expression: 
"+, 11 which specifies use of an adder, "1+ ,, which specifies 
an incremen ter , and ’’not" which specifies an inverter. Each 
operator is followed by its operands listed in svm fcolic 
notation. Therefore, the single operand of 1+ is the 
integer that results from evaluatin g the expression "(not 
c) • " Note that there is not a default hierarchy of opera- 
tions within a form. As with LISP, the order of operations 
in MacPitts must be specified explicitly by the use of 
nested parentheses. 

Sequences of setg forms normally operate sequen- 
tially, each being executed on a separate clock cycle. By 
enclosing the forms within another "parallelizing form," of 
which "par" is an example, several forms can be made to run 
in parallel, gaining speed over sequential operation at the 
cost cf more hardware and hence more area in silicon. The 
par form is used as fellows: 
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(par fcrml form2 form3...) 

Cf course the results obtained by running setg forms in 
parallel may be quite different from those obtained by 
running them all sequentially within one process. Consider 
the following example where "a" and "b" have already been 
declared registers (i.e. master-slave flip flops) : 

(par (setg a b) 

(setg b a) ) . 

This expression will result in exchanging the contents of 
"a" with contents of "b." The exchange will be done in one 
MacPitts clock cycle. This action is made possible by the 
input isolation which occurs during the flip-flop operating 
cycle. All such data storage elements are read before they 
are written. On the ether hand, seguential operation of the 
same setq’s is implied in the following process: 

(process loadl (setg a b) 

(setg b a) ) . 

This process will lead both b and a with the original 
contents of b, and require two cycles to do it. (Here 
’’loadl" merely furnishes a process name, as demanded by the 
MacPitts grammar.) We have used two lines and indented 
format only for the sake of clarity. All the functional 
information needed by MacPitts is denoted by the ordering of 
forms within the nests of parentheses. 

The "cond" form allows the conditional execution of 
ether forms it contains during a given state. It consists 
of a list of guards, only one of which is to be executed. 
Each guard begins with a "condition" which determines 
whether the remaining forms in the guard are to be executed. 
The first guard whose condition is true enables the execu- 
tion cf the forms following the condition in that guard. 
This is illustrated by the following example adapted from 
[Bef. 4]. 
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(cond (conditionl (cond (condition2 forml form2) 

(conditions form3 form4 form5) 

(t f orm6) ) ) 

(condition^ (cond (conditions form7 form8)) 

(cond (condition6 form9)) 
form 10) ) 

This example is heavily nested. Nevertheless, close exami- 
nation reveals that the outermost "(cond..." has only two 
guards in its list, each o f which contains other "(cond..." 
forms. The two guards are: 

(conditionl (cond (condition2 forml form2) 

(condition! form3 form4 form5) 

(t form6) ) ) 

and 

(conditior4 (cond (conditions form7 form8)) 

(cond (conditions form9)) 
form 10) 

If conditionl is false and condition4 is true then form 10 is 
executed. If conditions is true then form7 and form8 are 
executed along with fcrmlO. Likewise if conditions is true 
then fcrm9 is executed in parallel as well. 

The semantics of the cond statement is inherently 
parallel. The conditions of the alternate guards are 
checked in parallel. likewise, all forms within the guards 
are executed simultaneously in one clock cycle. The 
compiler makes the conditions of different guards in one 
cond form mutually exclusive, and implements them using 
combinational logic in the control unit as described above. 
This logic is used to enable or inhibit the execution of 
forms controlled by that guard in parallel. 

Note that the form: 
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(cond (t forml form2 form3 . ..)) 
is used to enable parallel execution of several forms during 
one clock cycle without being dependent on any condition. 



(The "t" stands for "true. ") 
encountered is actually just 
for the " (ccnd (t. form. 

In a MacPitts layout, 
the control unit, which is a 



The "(par..." form already 
a shorthand macro expression 

the conditions are formed in 
Weinberger array of NOR gates 
[fief. 15]. Therefore, they are not limited to only the 
sum-cf-prcducts notation used by PLA-based finite state 
machine compilers. The conditions are derived from either 
signals arriving on an input pin, signals from the data 
path, cr signals arriving from other processes. More 
complex conditions can be constructed from these signals 
using the logical operators "and," "or" and "not" to build 
arbitrary Boolean expressions. These operators are part of 
the MacPitts library of functions. Thus, the cond statement 
is one of the most powerful features for providing high 
performance designs. 



With this brief and somewhat condensed description 
of the features available in the MacPitts algorithmic 
language, the way is prepared to to understand an example of 
some code which will produce a complete integrated circuit 
chip. A full detailed description of all the facilities of 
MacPitts is found in a report authored by its creators 
[fief. 16], which also serves as a fairly complete users* 
manual. 



2. Two Mul t iplie r Example s 

Consider, line by line, figure 3.1 which is a 
listing of the file multic.mac. This example and the one 
which follows it are inspired by similar ones in [fief. 16]. 
It contains all of the design information needed by MacPitts 
to produce a 4 bit ccmbina tional multiplier. On any line. 
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1 



1 ; multiplier, no state combinational 

2 (program multic 4 

3 (def 1 ground) 

4 (def ain port input (2345)) 

5 (def bin port input (6789)) 

6 (def res port output (10 11 12 13)); result 

7 (def r0 port internal) 

8 (def rl port internal) 

9 (def r2 port internal) 

10 (def 14 phia) 

11 (def 15 phib) 

12 (def 16 phic) 

13 (def 17 power) 



15 


( cond 


((bit 0 bin) 


(se 


tq 


rO 


o> 


(bit 0 


ain) 


ain) ) ) 








16 




(t 


(setq rO 


0) ) ) 




















17 


( cond 


((bit 1 bin) 


(se 


tq 


rl 


o> 


(bit 0 


(+ rO 


ain) ) 


( + 


rO 


ain) ) ) ) 


18 




(t 


(setq rl 


(>> 


(bi 


.t 0 


rO) 


rO) ) ) ) 












19 


(cond 


((bit 2 bin) 


(se 


tq 


r2 


(>> 


(bit 0 


(+ rl 


ain) ) 


(+ 


rl 


ain)))) 


20 




(t 


(setq r2 


(>> 


(bi 


:t o 


rl) 


rl) ) ) ) 












21 


(cond 


((bit 3 bin) 


(se 


tq 


res 


(>> 


(bit 0 


(+ r 


2 ain) ) 


( + 


r2 


ain) ) ) 


22 




(t 


(setq res 


(>> 


(bit 


0 r2 


) r 2 ) ) ) 


) ) ) 











Figure 3.1 Multic. mac Source File. 



text following a semicolon is treated as a comment, which 
the compiler ignores. Line 2 tells the compiler that a 
"program” (which is another way of saying, "circuit design") 
called "multic" starts here, and that the data path is 4 
tits wide. Because the data path is only 4 bits, this 
simple multiplier will only be able to output numbers from 0 
to 15. Even though the input ports are also four bits wide, 
we must restrict input numbers to only those whose product 
falls in the range of values from 0 to 15. Furthermore, if 
this algorithm is to give correct results for all multi- 
pliers, without overflow, the leading bit of the multipli- 
cand must be zero. No provision is made to output a flag if 
the dynamic range of the multiplier is exceeded. 

Lines 3 through 13 declare the various signals and 
integer data words input to, output from and existing within 
the multiplier. Line 3 assigns the ground connection to pin 
1 which is always in the upper left corner of the layout; 
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subseguent pin numbers proceed clockwise from this point 
around the layout perimeter. Line 4 assigns pins 2-5 to an 
input port labeled "ain." This input is the multiplicand. 
Ey MacPitts convention, the most significant bit (MSB) of 
ain is read from the first pin on the list, pin 2, and the 
least significant bit (LSB) from the last pin on the list, 
pin 5. Line 5 similarly defines the multiplier input port, 
"bin." line 6 assigns an output port labeled "res" (for 
result) tc another block of 4 pins. This port also serves 
as the accumulator for the fourth and final partial product. 
Lines 7 through 9 define 3 internal ports (necessarily of 
width 4 hits) labeled rO, r 1 and r2. These serve to cascade 
the three stages of a standard shift and add algorithm. 
Each port contains one of the first three partial products, 
each being the result of operations conditioned on one of 
the multipler bits. Lines 10 through 12 assign pins tc the 
three phase clock, whether that clock is used by the circuit 
or not. In multic.mac the clock is not used. Line 13 
defines the + 5 volt direct current power, Vdd, connected to 
pin 17. 

line 14 signifies that the functions which fellow, 
up to the matching right parenthesis on line 22, are to 
execute on every clock cycle. The "(always..." form is 
really the "(process..." form, reduced to a single state. 
Moreover in this case, given the (always... form, and given 
that the data path contains only ports and not registers, 
the inputs will affect the result after an interval governed 
only by the sum of the physical gate delays in the data path 
and ccntiol unit. There is no controlled latency in the data 
path, because there are no registers in this design in which 
to store data. 

Lines 15 through 23 contain the shift and add 
scheme. In lines 15 and 16 the controller is told to 
examine bit 0 (the LSB) of bin. If it is high (true) the rO 
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port takes on the value of the ain port rotated right by one 
bit, i.e. rO is actually connected by means of a multiplexer 
organelle tc a right rotated version of ain- The shift- 
right-one-bit form, "»," takes two arguments- The second 
argument specifies what data word is being shifted, and the 
first tells what to put in the MSB of that data word- Thus, 
a rotate is also within the capabilities of the shift form, 
as it is applied in this case- If bit 0 of bin is not high, 
then, by line 16, the rO port — all 4 bits — is connected to 
ground. In lines 17 and 18 the controller is told to 
examine hit 1 of bin- If it is high, then rl, the next 
internal port in the data path, is connected to a right- 
rotated version of the sum cf rO and ain- The adder orga- 
nelle in MacPitts performs this summation as a standard 
ripple carry full addition. Note again that the expression: 

(bit 0 (+ rO ain)) 

in line 17 turns tie single shift operator into a right 
rotate operator by making the MSB of rl contain the same 
value as bit 0 of the sum of rO and ain. If bit 1 of bin is 
low, on the other hand, line 18 instructs the controller to 
connect rl to simply a right-rotated version of rO- Note 
that no rotations are being performed by any of these opera- 
tions in the sense that a shift register would perform them. 
It is only the interconnections between organalles that are 
being set up variously by the controller to give an appear- 
ance of forwarding a rotated version down the data path. 
Also note that even though the addition form appears twice 
in line 17, logically only one adder need be instantiated, 
since the operands are identical in both occurrences. 
MacPitts, too, can recognize this, and will not waste space 
creating more adders than the minimum necessary. In lines 

19 and 20 the controller examines bit 2 of bin. If it is 

high, port r2 is connected to a right-rotated version cf the 
sum of rl and ain. If bit 2 of bin is low, r2 is connected 
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to a right-rotated version of rl. In lines 21 and 22 the 
controller finally examines the MSB, bit 3, of bin. If it 
is high, the output port, res, is connected to a right- 
rotated version of the sum of r2 and ain. If bit 3 of bin 
is low, res is connected to a right rotated version of r2. 
For concreteness , a schematic trace of this algorithm in 
action on the problem "4x3=12” is presented in figure 3.2. 



ain=4 

0100 



bin=3 

0011 



-?* 



Algorithm 


Statement 


Result 


(setq rO 
(» (bit 


0 ain) ain)) 


r0=2 

0010 


(setg rl (» (bitO 
(+ rO ain)) (+ rO ain))) 


r 1=3 
0011 


(setq r2 
(» (bit 


0 rl) rl) ) 


r 2=9 
1001 


(setq res 
(» (bit 


o r2 ) r 2) ) 


res= 12 
1 100 



Figure 3.2 Example of the Multic Behavioral Specification. 



For comparison, consider now another design. This 
one is specified by the file multip. mac shown in figure 3.3 
This is a four bit pipelined multiplier in which the product 
does net appear at the result port until the third clock 
cycle after values have been applied to the inputs, ain and 
bin. Changing the combinational design to a pipelined 
design can most easily be accomplished in two steps. First, 
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1 


; multiplier^ with pipelining 




2 


(program 


multip 4 




3 


(def 


1 ground) 




4 


(def 


ain port input (234 5) ) 




5 


(def 


a0 register) 




6 


(def 


al register) 




7 


(def 


a2 register) 




8 


(def 


bin port input (6789)) 




9 


(def 


bO register) 




10 


(def 


bl register) 




11 


(def 


b2 register) 




12 


(def 


res port output (10 11 12 13) ) 




13 


(def 


r0 register) 




14 


(def 


rl register) 




15 


(def 


r2 register) 




16 


(def 


14 phia) 




17 


(def 


15 phib) 




18 


(def 


16 phic) 




19 


(def 


reset signal input 17) 




20 


(def 


18 power) 




21 


(always 




22 


(cond 


((bit 0 bin) (setq rO (>> (bit 0 ain) 


ain))) 


23 




(t (setq rO 0 ) ) ) 




24 


(cond 


1 ((bit 1 bO) (setq rl (>> (bit 0 (+ rO 


aO) ) (+ rO aO) ) ) ) 


25 




(t (setq rl (>> (bit 0 rO) rO ) ) ) ) 




26 


(cond 


1 ((bit 2 bl) (setq r2 (>> (bit 0 (+ rl 


al) ) (+ rl al) ) ) ) 


27 




(t (setq r2 (>> (bit 0 rl) rl)))) 




28 


(cond 


( (bit 3 b2) (setq res (>> (bit 0 (+ r2 


: a2) ) (+ r2 a2) ) ) ) 


29 




(t (setq res (>> (bit 0 r2) r 2 ) ) ) ) 




30 


(cond 


(reset (setq aO 0) 




31 




(setq bO 0) 




32 




(setq al 0) 




33 




(setq bl 0) 




34 




(setq a2 0) 




35 




(setq b2 0) ) 




36 




(t (setq aO ain) 




37 




(setq bO bin) 




38 




(setq al aO) 




39 




(setq bl bO ) 




40 




(setq a2 al) 




41 




(setq b2 bl) ) ) ) ) 





Figure 3.3 Multip.mac Source file. 

the three internal ports of multic, rO, rl and r2, are all 
redefined as registers. Then six other new registers, a0-a2 
and h0-t2 are defined to send successive values of the 
inputs ain and bin down the pipe in step with their corre- 
sponding partial products. The ease with which this is done 
(from a user’s point of view) is evidence of the power of 
MacPitts to create custom designs. 
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Referring to figure 3-3 we see that the shift and 
add algorithm, lines 22-29, is identical to that of 
multic .mac.- In line 19 pin 17 is defined as a "reset" signal 
input. The reset signal is require d for any MacPitts design 
which uses one or mere ’’process" forms in order that the 
program counters for all processes can always be reset to 
the same known state. This is obviously vital when two or 
more processes on the same chip must be synchronized. . In 
the multip design, however, which uses the "(always..." 
form, the reset signal performs no such built in automatic 
function. The reset signal is available, however, for user- 
specified functions as well, and in this case is used only 
to signal a setg of all internal multiplier and multiplicand 
registers to zero, instead of passing the values one more 
step down the pipeline. Therefore, the reset is not essen- 
tial to the pipeline multiplier operation here but only acts 
to allow the pipeline to be emptied out and to inhibit any 
new input data from propagating to completion, for what that 
may be worth in whatever the intended application. It is 
included here for illustration only. Recall that propaga- 
tion of all input data in the pipeline (lines 30-35 or, if 
reset is false, lines 36-41) occurs in a single clock cycle 
as well, because these setg’s are enclosed in the "(cond. .." 
form, which causes them to be executed in parallel. 

B. IBVCCATIOH OPTIOHS 

Eguipped with one or more .mac files written to reflect 
the desired behavior of a circuit, the user is ready to run 
maepitts. 1 The form of the command line invocation from the 
UNIX shell is simply 

% maepitts <program_name> <options> 



*The name assigned to the executable binary file on the 
UNIX operating system which embodies the MacPitts system is 
"maepitts. " 
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where <piogram_name> would be either multic or multip, in 
the case of the previous examples, and <options> is any or 
none of the words frcm the list: 



stat* 


nostat* 


herald 


• noherald 


cif* 


nocif 


obj* 


noob j 


int 


noint* 


opt-d* 


noopt-d 


opt-c* 


ncopt-c 


4u 


5u* 



where the * options are the defaults and the left and right 
columns are mutually exclusive. 

The "stat" option tells macpitts to output statistics about 
the chip design to the standard output device (terminal 
screen, normally) as various parameters are calculated. 
Figure 3.4 shows the statistics generated for the multip 



1 Statistic 

2 Statistic 

3 Statistic 

4 Statistic 

5 Statistic 

6 Statistic 

7 Statistic 

8 Statistic 

9 Statistic 

10 Statistic 

11 Statistic 

12 Statistic 

13 Statistic 

14 Statistic 

15 Statistic 



for project multip 

options: (\5u herald opt-d opt-c stat obj cif) 

Maximum control depth is 4 

Number of gates is 60 

Data-path has 25 Units 

Control has 69 columns 

Circuit has 1129 transistors 

Control has 17 tracks 

Power consumption is 0.172120 Watts 

Data-path internal bus uses 5 tracks 

Dimensions are 6.320000 mm by 2.847500 mm 

Memory used - 526K 

Compilation took 30.432777 CPU minutes 
Garbage collection took 18.520277 CPU minutes 
For a total of 796 garbage collections 



Figure 3.4 Compiler Statistics for multip. 

chip. The meaning of these statistics is as follows. 

line 1 simply echoes the progran name which was given at the 

beginnirg of the multip.mac source file. 
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line 2 summarizes the invocation options in effect either by 
user selection or default. 

line 3 gives the worst-case number of logic levels between 
any input and any output in the control unit, 
line 4 gives the total number of NOR gates needed in the 
control unit. 

line 5 is the number of data path "organelle units," where 
an organelle unit is a word-length assembly of organelle 
bits. Shis number is the same as the number of elements in 
the data path list of the multip.obj file. 

line 6 is the number of vertical metal columns in the 
contrcl array, excluding the ground columns. 

line 7 is the total number of transistors in the circuit, 
including the data path, control unit, and all bonding pads, 
line 8 is the stack height of horizontally running polysi- 
licon lines used to intraconnect the control unit, 
line 9 is an estimate of the worst-case static power 
consumption of the chip obtained using the layout topology, 
heuristic values of undetermined origin for the conductivity 
of each electrical feature, and a 5 volt power supply, 
line 10 is the maximum stack height of horizontally placed 
polysilicon lines, per bit in the data path, needed to 
interconnect the organelles. 

line 11 is the overall outline size of the chip layout, 
line 12 is the peak storage allocation demanded by macpitts 
during the run. 

line 13 is the CPU time required for compilation and layout, 
which is always less than the apparent running time by an 
amount which depends on the average system usage rate, 
lines 14 and 15 reflect a function of Franz Lisp wherein 
past used storage locations are reclaimed for the available 
memory list. The last three statistics were probably 
included because macpitts can be very demanding of computing 
resources. 
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The "herald” option outputs messages to the terminal 
screen at each milestone in the sometimes lengthy compila- 
tion process. These reassure the user that macpitts is 

still running. In addition to heralding what point in the 

design process macpitts is currently working on, information 
on current accumulated CPU time and CPU garbage collection 
time is printed at the beginning of each herald line in 
units of sixtieths of a second. 

The "cif" option keys the compiler to output a mask 
level description .cif file in the Caltech Intermediate 
Form. The cif option is normally not deselected unless the 
available disk storage space is limited and the user is only 
interested in reading the statistics for his compiled 
design. (The cif file for a relatively simple design, 
multip.cif, is over 158 kilobytes long.) If no cif is 
produced on a given macpitts run, the entire layout process 
must be repeated to subsequently obtain a cif file. This is 
done most expeditiously by running macpitts with the nocbj 
option. 

The "nocbj" option tells macpitts to start with a previ- 
ously created object file (the output of the macpitts "front 
end,") rather than a source file. MacPitts will then effec- 
tively start at the "back end," doing the layout and 
outputing statistics and cif, assuming these are included in 
the cpticns list. 

"Int" tells macpitts to use the interpreter mode, which 
allows functional simulation of the chip without actually 
performing the layout and generating a .cif file. 

"Cpt-c" and "opt-d" invoke optimization routines for 
normalization of the combinatorial logic of the control 
unit. Investigation of the four possible combinations of 
these two options reveals that they do not affect the 
overall dimensions of the final 8 bit multiplier design (to 
be described later.) This is probably because the pins. 
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data path layout and tus wiring dominate the chip area, not 
the ccntrol unit, which is comparatively small for this 
chip. The compilation time required, however, was approxi- 
mately 20 percent greater when opt-c and opt-d were used 
than when they were not used. Using opt-c and opt-d does 
reduce the complexity of the control unit, and therefore 
will reduce signal delays, to the benefit of operating 
speed. 



The n 4u n option sets the minimum feature 
layout to 4 microns, and accordingly lambda, 
used parameter which represents the half line 
sion, is set to 200 centimi crons. 

Another option, logo, was available in 
macpitts, tut is not supported at NPS because 
files are not currently available. 



size for the 
the commonly 
width dimen- 

the original 
suitable font 



C. USE Cl THE H ACPITTS IHTERPBETER 

Invoking macpitts with the int option should be the 
first step in every Macpitts design cycle. Macpitts has 
good facilities for catching grammatical errors in the 
user’s .mac source cede which operate whether or not the 
interpreter is invoked. After the .mac file passes grammar 
checks, the interpreter allows the extracted algorithmic 
description to be exercised with arbitrary inputs. The 
results are displayed on the screen to provide an indication 
that the design is functionally correct. Assuming the 
user’s path list is set up in the .login file to include the 
directory, /vlsi/macpit, the following command can be 
issued: 

% macpitts multip int herald 

This will cause macpitts to scan the multip. mac source file 
and extract from it the circuit behavior information. Then 
macpitts will display a table of all declared ports. 
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registers, flags, signals and processes, noting that they 
are all currently undefined. The user may select for 
display, at this point, a menu of interactive commands which 
clearly states hew tc interact with the interpreter. The 

user can set the values of input ports and signals as 
desired. Net all internal ports will necessarily te defined 
simply hy setting the input ports. Generally several clock 
cycles must be simulated before the chip internals are all 
defined. Macpitts tells the user which antecedants stand in 
the way of resolving data definitions. Next the user will 
probably single step (or multi step) the macpitts clock 
while observing the effect on the internal registers and 
output port (s) after each cycle. There is also provision to 
write out the current state of the circuit to a file, 
multip.int. Any number of states can be saved by appro- 
priate renaming of files as they are written. Since 
macpitts does not allow the user to specify different file 
names for each state saved, newly written .int files can 
immediately be renamed uniguely from an adjacent terminal 
logged on tc the same account as the one running macpitts. 
This is completely feasible under UNIX. 

As an example, figure 3.5 shows a concatenated listing 
of 4 such files '-from a single session with the macpitts 
interpreter. As would be expected, the format of these 
files is that of a LISP list, whose meaning can be clearly 
inferred because it fellows the same syntax as the MacPitts 
language itself. The first file, lines 1-14, is a dump of 
the state of the circuit after setting the input ports ain 
and bin to 4 and 3, respectively, and the reset signal to 

false. Note that all data downstream of the inputs is still 
undefined at this point. Lines 16-28 show the result after 

one clock cycle. Lines 30-42 show the result after two 
clock cycles. Lines 44-56 show the result after the third 
clock cycle when the result, 12, is present for the first 
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Figure 3,5 A HacFitts Interpreter Session for sultip. 
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time on the output pert- Note that at this point the input 
data, which was never changed during this session, has also 
propagated down the three stage pipeline. Of course, one 
would normally not use a pipelined processor with static 
data, because the advantage of higher throughput is wasted. 
The exercise only serves to demonstrate the behavior of the 
interpreter option. 

Two points of practical interest should be made before 
closing the interpreter discussion. First, it should be 
observed that the bottom lines of text in the terminal 
display will be jumbled on the ADM-36 terminals because the 
/etc/ter mcap libraries in UNIX version 4.2 differ slightly 
from those in ON IX version 4.1. Proper screen presentation 
is obtained, however, if the GIGI terminal is used. Second, 
the interpreter runs very slowly. It is not unusual during 
hours of heavy system useage for one to two minutes of 
terminal time to elapse while the interpreter is processing 
a single command to cycle the clock. At night, with cnly 2 
users logged on, this clocking operation only takes ten to 
fifteen seconds. 

E. EVOLUTION OF THE 8 BIT PIPELINED MULTIPLIER 

1 . Design Motivation a nd Constraint s 

Cne possible application for a digital pipelined 
multiplier of unsigned integers is as part of a high speed 
digital filter realization. Work done by Loomis and Sinha 
£Ref. 17] indicates that the impact of pipelining delays on 
the behavior of digital recursive filters can be compensated 
for by adjusting the filter weights. Furthermore, their 
work shows that the stability of the filter can be improved 
by increasing the number of pipeline stages. It was decided 
that the design of a multiplier for such applications could 
be a suitable vehicle from which to study the MacPitts 
compiler. 
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The design of circuits which can be fabricated using 
the available ARPA/MOSIS implementation service is 
constrained by two standard parameters: a maximum project 

size of €890 x 6300 microns, and a maximum bonding pad count 
of 64 pins. To fully explore the capabilities of MacPitts, 
it is probably most enlightening to proceed in steps toward 
the ultimate design. 

2 . first Design : 3 Stage s. 8 Bits on One Chip 

To better appreciate the issue involved, the first 
design is an expansion of multip.mac to an 8 bit wide data 
path with enough "coed" forms to realize an 8 bit multipli- 
cation. Note, however, that the MSB of the multiplicand 
(ain) must be zero to avoid overflows of the partial product 
and results ports. Two output ports are used, one for the 
high order 8 bits of the result (hres) , and one for the low 
order 8 bits of the result (Ires) . Together these ports 
form a 16 bit product. One expects the hres MSB always to 
be zero because the largest valid product is 127x255=32385, 
which is less than 2 1S . Because the design has three sets 
of registers, there are three stages of pipelining, and 
there is room in the chip for three distinct multiplication 
problems to be in process simultaneously. A speed vs. area 
tradeoff is effected by alternating ports with registers in 
the data path. Ports consume less area than registers. 
However, ports also introduce more delay in the pipeline 
stages (whose boundaries are defined by registers) thereby 
lowering the maximum clock frequency. To further save area, 
the multiplier bits from bin share space in the low order 
intermediate results registers (IrO, lrl, lr2) and ports 
(lpO, lpl, lp2, lp3. Ires) by using the following device: 
after each bit of the multiplier is tested, it is shifted 
off the right end of the register/port, leaving room at the 
left end for another bit of the low order result to be 
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shifted in. The source file for this design, multip8.mac, 
is shewn in figures 3.6 and 3.7. This file was arrived at 
after first considering what resources would be needed to 
perform the multiplication. Then register/port templates 
were written down on paper, and the flow of data traced for 
a specific case. Next the algorithm depicted by the data 
flow was translated into MacPitts language resulting in a 
diagram resembling the style of figure 3.2. Finally the 
definitions, conditions, and reset functions were added to 
complete the multi p8. mac file. Figure 3.8 partially illus- 
trates the manner in which this was done for the example 
104x22=2288. Only tie first pipeline stage is shown, repre- 
senting the first twe multiplier bits. 

Figure 3.9 shews the linear arrangement of the ports 
and registers in the data path for this multiplier, as well 
as the placement of shift and add organelles. The flow of 
data is down the pace. The large size of the full adders 
relative to the other organelles is not reflected in the 
scale of this figure. The resulting maepitts layout for 
this design measures 11848 x 4897.5 microns, which is far 
too large to be fabricated in a standard MOSIS run. It 
appears that the design must therefore be partitioned in 
some way among two cr more chips. Ideally, these parti- 
tioned "partial multipliers" should be identical in design 
if fabrication and testing costs are to be minimized. 

3. First Partitioning: 2 Bits . 1_ Sta ge Pipeli ne 

The multip8 design may be partitioned in a number of 
ways. The first approach might be to process two multiplier 
tits on the chip using one register stage and then cne pert 
stage to hold the twe partial products in a single pipeline 
stage, then pipe the partial result to another identical 
chip. Such a design requires 4 chips to do a complete 7 bit 
by 8 bit multiplication with 4 stages of pipelining in all. 
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1 ; 3-stage pipelined multiplier, product is 16 bit unsigned integer 

2 (program multip8 8 ; data path is 8 bits wide 

3 (def 1 ground) 

4 (def ain port input (2 3 4 5 6 7 8 9)) multiplicand 

5 (def bin port input (10 11 12 13 14 15 16 17)) ?multiplier 

6 (def aO register) 

7 (def al register) 

8 (def a2 register) 

9 (def hpO port internal) 

10 (def lpO port internal) 

11 (def hrO register) 

12 (def IrO register) 

13 (def hpl port internal) 

14 (def lpl port internal) 

15 (def hrl register) 

16 (def lrl register) 

17 (def hp2 port internal) 

18 (def lp2 port internal) 

19 (def hr2 register) 

20 (def lr2 register) 

21 (def hp3 port internal) 

22 (def lp3 port internal) 

23 (def hres port output (18 19 20 21 22 23 24 25)) ;high bits of result 

24 (def Ires port output (26 27 28 29 30 31 32 33)) ;low bits of result 

25 (def 34 phia) 

26 (def 35 phib) 

27 (def 36 phic) 

28 (def reset signal input 37) 

29 (def 38 power) 

30 ; end of definitions 

31 (always 



32 


(cond 


( (bit 0 bin) 








33 




(setq hpO 


(» 


ain) ) 




34 




( setq lpO 


(» 


(bit 0 


ain) bin) ) ) 


35 




(t 








36 




(setq hpO 


0) 






37 




(setq lpO 


(» 


bin) ) ) ) 




38 


(cond 


((bit 0 lpO ) 








39 




(setq hrO 


(» 


(+ hpO 


ain) ) ) 


40 




(setq IrO 


(» 


(bit 0 


(+ hpO ain) ) lpO) ) ) 


41 




(t 








42 




(setq hrO 


(>> 


hpO ) ) 




43 




(setq IrO 


(» 


(bit 0 


hpO) lpO ) ) ) ) 


44 


(cond 


((bit 0 IrO) 








45 




(setq hpl 


(>> 


(+ hr 0 


a0 ) ) ) 


46 




(setq lpl 


(>> 


(bit 0 


(+ hrO aO) ) IrO) ) ) 


47 




(t 








48 




(setq hpl 


(>> 


hrO) ) 




49 




(setq lpl 


(» 


(bit 0 


hr 0 ) IrO ) ) ) ) 


50 ‘ 


(cond 


( (bit 0 lpl) 








51 




(setq hrl 


(» 


(+ hpl 


aO) ) ) 


52 




(setq lrl 


(» 


(bit 0 


(+ hpl aO) ) lpl) ) ) 


53 




(t 








54 




(setq hrl 


(>> 


hpl) ) 




55 




(setq lrl 


(>> 


(bit 0 


hpl) lpl)))) 


56 


( cond 


((bit 0 lrl) 








57 




(setq hp2 


(>> 


(+ hrl 


al) ) ) 


58 




(setq lp2 


(>> 


(bit 0 


(+ hrl al) ) lrl) ) ) 


59 




(t 
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(setq hp2 


(>> 


hrl) ) 




61 




(setq lp2 


(>> 


(bit 0 


hrl) lrl) ) ) ) 



Figure 3.6 Hultip8.mac Source File. 
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62 


(cond 


( (bit 0 lp2) 


63 




(setq hr2 (>> (+ hp2 al) ) ) 


64 




(setq lr2 (>> (bit 0 (+ hp2 al)) lp2) ) ) 


65 




(t 


66 




(setq hr2 (>> hp2) ) 


67 




(setq lr2 (>> (bit 0 hp2) lp2)))) 


68 


(cond 


( (bit 0 lr2) 


69 




(setq hp3 (>> (+ hr2 a2) ) ) 


70 




(setq lp3 (>> (bit 0 (+ hr2 a2)) lr2))) 


71 




(t 


72 




(setq hp3 (>> hr2) ) 


73 




(setq lp3 (>> (bit 0 hr2) lr2)))) 


74 


(cond 


( (bit 0 lp3) 


75 




(setq hres (>> (+ hp3 a2))) 


76 




(setq Ires (>> (bit 0 (+ hp3 a2) ) lp3) ) ) 


77 




(t 


78 




(setq hres (>> hp3) ) 


79 




(setq Ires (>> (bit 0 hp3) lp3)))) 


80 


(cond 


(reset 


81 




(setq aO 0) 


82 




(setq al 0) 


83 




(setq a2 0) ) 


84 




(t 


85 




(setq aO ain) 


86 




(setq al aO) 


87 




(setq a2 al) ) ) ) ) 



Figure 3-7 Hultip8.mac Source File (Continued). 



Figure 3-10 is a block diagram of this design approach. The 
MacPitts source file for this design, given in figure 3-11, 
defines another input port, "hin," which should be connected 
to the high order 8 bit partial product output of the 
previous stage, unless the chip is the first one in the 
array. In that case, "hin” is connected to ground (i.e. 
zero.) To further reduce area, the reset function was elim- 
inated, because it is not in any way essential to the func- 
tioning of a multiplier used in a high throughput signal 
processing environment such as is envisioned for this 
design. 

This arrangement of identical processing elements 
connected in a linear array to produce a pipelined result is 
similar in concept to the systolic array approach formulated 
by Rung [Bef. 18], although he was more generally concerned 
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Figure 3.8 Use of Ports and Registers in multip8.nac. 

with individual processing elements of greater complexity 
than that of multip8a cells. 

The macpitts layout of multip8a has outline dimen- 
sions of 5848 x 6140 microns. The data path and control 
unit crly occupy approximately 3000 x 2500 microns. The 
overall chip is large compared to its "working circuitry" 
because of the need to place 53 pin pads around only three 
sides of the perimeter. This design does not approach full 
utilization of the available 6890 x 6300 micron silicon 
area . 



4. Sec ond Parti t ioning ; 4 Bits, 2 Stage Pipeline 

It seems clear that more of the design will fit on 
the chip and still not exceed the maximum size for 



54 



> 3 phase clock > 

< bit 0 to conticl < 

< bit 0 to control < 

> select from control > 

> or setq 0 from control > 

> MSB filled from control > 

< bit 0 to control < 

< bit 0 to control < 

> select from control > 

< bit 0 to control < 

> MSB filled from control > 

< bit 0 to control < 

> setq 0 from control > 

< bit 0 to control < 

> select from control > 

< bit 0 to control < 

> MSB filled from control > 

< bit 0 to control < 

< bit 0 to control < 

> select from control > 

< bit 0 to control < 

> MSB filled from control > 

< bit 0 to control < 

> setq 0 from control > 

< bit 0 to control < 

> select from control > 

< bit 0 to conticl < 

> MSB filled from control > 

< bit 0 to control < 

< bit 0 to conticl < 

> select from control > 

< bit 0 to control < 

> MSB filled from control > 

< bit 0 to conticl < 

> setq 0 from control > 

< bit 0 to control < 

> select from control > 

< bit 0 to conticl < 

> MSB filled from control > 

< bit 0 to conticl < 

< bit 0 to conticl < 

> select from control > 

> MSB filled from control > 



bin port 
ain port 
right shift 
hpO port 
right shift 
lp(3 port 
full adder 
right shift 
-4irO register 
right shift 
-*lro register 
-►aO register 
full adder 
right shift 
hpl port 
right shift 
lpi port 
full adder 
right shift 
fhrl register 
right shift 
•lri register 
.al register 
full adder 
right shift 
hp2 port 
right shift 
lp2 port 
full adder 
right shift 
-►hr 2 register 
right shift 
»lr2 register 
L*a2 register 
full adder 
right shift 
hp3 port 
right shift 
lp3 port 
full adder 
right shift 
hres port 
right shift 
Ires port 



Figure 3.9 Data Path Architecture of Multip8 Chip. 



fabrication. Design multi p8b (source file shown in figure 
3.12) tests four bits of the multiplier on one chip, there- 
fore, only two of these chips are needed to do a complete 
multiplication. Essentially this is just a doubled version 
of multip8a. The MacPitts layout is 7130 x 6140 microns for 
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1 




16 bit product 



Figure 3.10 Block Diagram of First Partitioning. 

the 5 nicrcn option. This is too large to fabricate. Rerun 
with the 4 nicron option, the multip8b chip has satisfactory 
dimensions: 5884 x 6C24 microns- 



56 




1 ; 1 stage of a 4-stage pipelined multiplier 

2 ; product is a 16 bit unsigned integer 

3 (program multip8a 8 ; data path is 8 bits wide 

4 (def 1 ground) 

5 (def ain port input (23456789)) /multiplicand input 

6 (def bin port input (10 11 12 13 14 15 16 17)) ; multiplier input 

7 / this port also receives the lower 8 bits of the partial product 

8 (def hin port input (18 19 20 21 22 23 24 25)) /upper 8 bits of 

9 / partial product from preceding stage, zero if first stage. 



10 


(def 


aout port output (26 27 28 29 30 31 32 


33)) ; 


mul tipi icand 


output 


11 


(def 


hout port output (34 35 36 37 38 39 40 


4D) ; 


upper 


8 bits 


of 


12 


/ partial product output 










13 


(def 


lout port output (42 43 44 45 46 47 48 


49)) ; 


lower 


8 bits 


of 


14 


/ partial product output and shifted multiplier output 






15 


(def 


al register) 










16 


(def 


hrl register) 










17 


(def 


Irl register) 










18 


(def 


50 phia) 










19 


(def 


51 phib) 










20 


(def 


52 phic) 










21 


(def 


53 power) 










22 


/ end of definitions 










23 


(always 










24 


(cond 


( (bit 0 bin) 








- 


25 




(setq hrl (>> (+ hin ain))) 










26 




(setq lrl (>> (bit 0 ( + hin ain)) 


bin) ) ) 








27 




(t 










28 




(setq hrl (>> hin) ) 










29 




(seta lrl (>> (bit 0 hin) bin)))) 










30 


(cond 


((bit 0 *1 rl ) 










31 




(setq hout (>> ( + hrl ain))) 










32 




(setq lout (>> (bit 0 ( + hrl ain)) 


lrl) ) ) 








33 




(t 










34 




(setq hout (>> hrl) ) 










35 




(setq lout (>> (bit 0 hrl) lrl)))) 










36 


$ 












37 


(setq 


[ al ain) 










38 


(seta 


[ aout al) ) ) 











Figure 3.11 Hultip8a. mac Source File. 

5. Third Parti t ioning : 2 Bits , 4 S tage Pipe line 

Ey replacing every internal port with a register, 
and providing two additional corresponding pipeline regis- 
ters for the multiplicand, the delay per pipeline stage can 
be reduced by a factor of approximately two because the 
adders drive a register directly instead of through a pert 
and another adder. The clock rate can therefore be approxi- 
mately doubled. This modification has another attractive 
feature in that it allows the output port to be driven 
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1 ; 2 stages of a 4-stage pipelined multiplier 

2 ; product is a 16 bit unsigned integer 

3 (program multip8b 8 ; data path is 8 bits wide 

4 (def 1 ground) 

5 (def ain port input (23456789)) ; mul t ipl icand input 

6 (def bin port input (10 11 12 13 14 15 16 17)) ; multiplier input 

7 ; this port also receives the lower 8 bits of the partial product 

8 (def hin port input (18 19 20 21 22 23 24 25)) /upper 3 bits of 

9 ; partial product from preceding stage, zero if first stage. 

10 (def aout port output (26 27 28 29 30 31 32 33)) ; multiplicand output 

11 (def hout port output (34 35 36 37 38 39 40 41)) ; upper 8 bits of 

12 ; partial product output 

13 (def lout port output (42 43 44 45 46 47 48 49)) ; lower 8 bits of 

14 ; partial product output and shifted multiplier output 

15 (def al register) 

16 (def a2 register) 

17 (def hrl register) 

18 (def lrl register) 

19 (def hpl port internal) 

20 (def lpl port internal) 



21 


(def ! 


hr2 register) 








22 


(def 


lr2 register) 








23 


(def 


50 phia) 








24 


(def 


51 phib) 








25 


(def 


52 phic) 








26 


(def 


53 power) 








27 


; end of definitions 








28 


(always 








29 


(cond 


( ( bit 0 bin) 








30 




(setq hrl 


(» 


( + hin ain))) 




31 




(setq lrl 


(>> 


(bit 0 ( + hin ain) ) 


bin) ) ) 


32 




(t 








33 




(setq hrl 


(>> 


hin) ) 




34 




(setq lrl 


(» 


(bit 0 hin) bin) ) ) ) 




35 


(cond 


( (bit 0 lrl) 








36 




(setq hpl 


(>> 


(+ hrl ain) ) ) 




37 




(setq lpl 


(>> 


(bit 0 (+ hr 1 ain) ) 


lrl) ) ) 


38 




(t 








39 




(setq hpl 


(>> 


hrl) ) 




40 




(setq lpl 


(>> 


(bit 0 hrl) lrl) ) ) ) 




41 


(cond 


((bit 0 lpl) 








42 




(setq hr2 


(>> 


(+ hpl al) ) ) 




43 




(setq lr2 


(» 


(bit 0 (+ hpl al ) ) 


lpl) ) ) 


44 




(t 








45 




(setq hr2 


(>> 


hpl) ) 




46 




(setq lr2 


(>> 


(bit 0 hpl) lpl) ) ) ) 




47 


(cond 


( (bit 0 lr2) 








48 




(setq hout 


(>> 


(+ hr2 al) ) ) 




49 




(setq lout 


(>> 


(bit 0 (+ hr 2 al) ) 


1 r 2 ) ) ) 


50 




(t 








51 




(setq hout 


(>> 


hr 2 ) ) 




52 




(setq lout 


(» 


(bit 0 hr 2) lr2) ) ) ) 


53 












54 


( setq 


al ain) 








55 


(setq 


a2 al ) 








56 


(setq 


aout a2 ) ) ) 









Figure 3.12 Multip8b.aac Source File. 
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1 ; 2 stages of a 4-stage pipelined multiplier 

2 ; product is a 16 bit unsigned integer 

3 (program multip8c 8 ; data path is 8 bits wide 

4 (def 1 ground) 

5 (def ain port input (23456789)) multiplicand input 

6 (def bin port input (10 11 12 13 14 15 16 17)) ; multiplier input 

7 ; this port also receives the lower 8 bits of the partial product 

8 (def hin port input (18 19 20 21 22 23 24 25)) ;upper 8 bits of 

9 ; partial product from preceding stage, zero if first stage. 

10 (def aout port output (26 27 28 29 30 31 32 33)) ; multiplicand output 

11 (def hout port output (34 35 36 37 38 39 40 41)) ; upper 8 bits of 

12 ; partial product output 

13 (def lout port output (42 43 44 45 46 47 48 49)) ; lower 3 bits of 

14 ; partial product output and shifted multiplier output 

15 (def al register) 

16 (def a2 register) 

17 (def a3 register) 

18 (def a4 register) 

19 (def hrl register) 

20 (def lrl register) 

21 (def hr2 register) 

22 (def lr2 register) 

23 (def hr3 register) 

24 (def lr3 register) 

25 (def hr4 register) 

26 (def lr4 register) 

27 (def 50 phia) 

28 (def 51 phib) 

29 (def 52 phic) 

30 (def 53 power) 

31 ; end of definitions 

32 (always 



33 


(cond 


( (bit 0 bin) 








34 




(setq hrl 


(>> 


(+ hin 


ain) ) ) 


35 




(setq lrl 


(>> 


(bit 0 


(+ hin ain) ) bin) ) ) 


36 




(t 








37 




(setq hrl 


(>> 


hin) ) 




38 




(setq lrl 


(>> 


(bit 0 


hin) bin)))) 


39 


(cond 


((bit 0 lrl) 








40 




(setq hr2 


(>> 


(+ hrl 


al) ) ) 


41 




(setq lr2 


(» 


(bit 0 


(+ hrl al) ) lrl) ) ) 


42 




(t 








43 




(setq hr2 


(» 


hrl) ) 




44 




(setq lr2 


(>> 


(bit 0 


hrl) lrl)))) 


45 


(cond 


( (bit 0 1 r2 ) 








46 




(setq hr3 


(» 


(+ hr2 


a2 ) ) ) 


47 




(setq lr3 


(» 


(bit 0 


(+ hr 2 a2) ) lr2) ) ) 


48 




(t 








49 




(setq hr3 


(>> 


hr 2) ) 




50 




(setq lr3 


(>> 


(bit 0 


hr2) 1 r 2 ) ) ) ) 


51 


(cond 


( (bit 0 1 r 3 ) 








52 




(setq hr4 


(>> 


(+ hr3 


a3 ) ) ) 


53 




(setq lr4 


(» 


(bit 0 


(+ hr 3 a3) ) lr3) ) ) 


54 




(t 








55 




(setq hr4 


(>> 


hr 3 ) ) 




56 




(setq lr4 


(>> 


(bit 0 


hr 3) 1 r 3 ) ) ) ) 


57 


/ 










58 


( setq 


hout hr4) 








59 


(setq 


lout lr4) 








60 


(setq 


al ain) 








61 


(setq 


a2 al ) 








62 


(setq 


a3 a2 ) 








63 


(setq 


a4 a3) 








64 


(setq 


aout a 4 ) ) ) 









Figure 3.13 Multip8c.nac Source File. 
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directly by a register rather than by an adder. Thus, the 
output data is valid sooner after the completioa of a clock 
cycle than it was in the case of multip8b. 

Seme room to spare on the multip8b 4 micron layout 
leaves hope that this four stage pipeline algorithm, figure 
3.13, may he feasible. In fact, the maepitts layout for 
multip8c measures 8218 x 6140 microns in 5 micron tech- 
nology. In 4 micrcn technology the chip measures 6766 x 
6024 microns, which consumes almost 94 per cent of the 
maximum allowable chip area. This is a good indication that 
the limit may in fact have been reached on obtaining any 
more elaborate design variations for the multiplier which 
can he fabricated by the standard MOSIS facilities. 

A summary of statistics produced by maepitts for all 
the multiplier designs explored in this chapter is given in 
table I. Each line represents a different cif file, seme of 
which may he derived from the same source file, with the 
only difference being the invocation options. The roct of 
each entry in the "DESIGN” column corresponds to the name of 
a multiplication algorithm introduced previously in this 
chapter. To clarify the notation of the "DESIGN” column, 
note that the last digit gives the minimum feature size 
selected, in microns. Where no digit is explicitly stated, 
the minimum feature size is 5 microns. 

E. DESIGN VALIDATION 

1. Functio nal Simulation 

Eefcre proceeding with fabrication it is necessary 
to validate the multip8c4 design by functional simulation, 
design rule checking and node extraction with subsequent 
event simulation. 
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TABLE I 

Statistics For MacPitts Multiplier Chip Designs 
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Note: The (nocd) run was made with the noopt-c and noopt- 

options in effect. 




Id any functional simulation the first issue to 
address is, "How exhaustive shall the simulation be?” Truly 
exhaustive testing cf multip8c4 is a formidable task, at 
best. The number of different electrically possible combi- 
nations of bits for the three input ports — ain, bin and 

hin— is 

(28)3 = 22 4 = 16,777,216. 

Then, there are four internal pipeline stages. Therefore, 
ideally, every sequence of 4 of these 16,777,216 inputs 
should be tested because there should be no restrictions on 
the ordering of problems in the pipeline. This considera- 
tion increases the number of possible states to 

(16 , 777 , 216 ) ♦ = 7.92x1028 states. 

Each state transition requires five transitions of the raw 
clock, as will be recalled from figure 2.4. It is reason- 
able to assume a raw clock frequency of 10 MHz for an NMOS 
circuit. For the master-slave flip flops used in MacPitts 
this translates to a state transition rate of 2 MHz. From 
this assumption the time to cover all states of this circuit 
is calculated to be 

7.92x1028 states / 2x10 6 states/sec = 3.96x10 2 2 seconds 
3.96x1022 S ec / 8.64x10 4 sec/day = 4.58x10 17 days 
4.58x1C* 7 days / 365 days/year = 1.26x10* s years 
Therefore testing every electrically possible state, even 
once, is obviously impractical. 

If only each 24 bit input combination were tested 
once, without regard for the order in which these tests were 
conducted, the time required is only 

16,777,216 / 2x10 6 = 8.38 seconds. 

It shculd be remembered that, in its intended application, 
the number of expected input combinations to multip8c is 
considerably smaller. There are only (255x127) +1 or 32386 

possible 7x8 bit multiplication problems. Each of these 
will have hin-0 on the first chip. The second chip will 
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have only one unique set of inputs passed to it by the first 
chip for each of these 32386 problems. Therefore, the total 
number of different input combinations of ain, bin and hin 
that will be encountered in actual operation is no greater 
than 2x22386 or 64772. The precise number is somewhat 
smaller still because some problems, such as those which 
have zero for the multiplier or multiplicand, will output a 
zero from hout in the first chip to hin of the second chip 
thus duplicating the first chip set of inputs for seme ether 
problem. 

Rhen using the maepitts interpreter to run a func- 
tional simulation, at least fifteen seconds must be allowed 
for computing the changes at each clock cycle. This fact 
makes testing even all expected input combinations imprac- 
tical. Instead one random problem is chosen: 104 x 22 - 

2288. The product 2288 is represented as hout=00C01000=8, 
decimal and lout=1 1 1 1 0000=240, decimal, since 

(256x8) +240=2288. figures D.2 through D.6 in Appendix D 
show interpreter output files for each of the 8 clock cycles 
needed to produce the result, and a ninth clock cycle to 
demonstrate that the output is not subject to unccmmanded 
changes. Between clock cycles 4 and 5 the inputs were 
changed to simulate two chips in cascade. The results are 
correct, indicating proper behavior of the specification 
algorithm. 

A source listing for the program "values" appears in 
figure 3.14, together with a sample run using the problem 
given above. This program allows generation of the multip8c 
result giver any combination of ain, bin and hin values 
entered from the terminal keyboard. 

2. D esign Rule C heckin q 

The reality of the claim that MacPitts designs are 
"correct by construction" can be tested. The multip8c.cif 
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1 



main() /* interactive simultation of multip8c chip */ 

{ 

unsigned int ain, bin, hin, bout, lout, result; 

unsigned int testl, test2, c; 

printf ("Type anytime to quit.\n\n"); 

/* Loop until interupt is signaled from keyboard */ 
top : 

/* Read input values from keyboard. */ 
printf ( " Enter ain... " ) ; 
scanf ( "%d" , &ain) ; 
printf ("Enter bin... "); 
scanf ( "%d" , &bin) ; 
printf ("Enter hin... "); 
scanf ( "%d" , &hin) ; 

/* Compute the results: first initialize output registers. */ 

lout = bin; 
hout * hin; 

/* Simulate multip8c algorithm. */ 
for (c=l; c<=4; C++) { 

testl = lout & 001; 
if (testl == 1) 

hout = hout + ain; 
lout = lout >> 1; 
test2 = hout & 001; 
if (test2 == 1) 

lout = lout + 128; 
hout = hout >> 1; 

} 

/* Put output reister values into concatenated decimal form. */ 
result = 256*hout + lout; 

/* Display all values on the screen. */ 

printf ("ain=%-4d bin=%-4d hin=%-4d hout =%-4d lout=%-4d resul t=%-5d\n\n" , 
ain, bin, hin, hout, lout, result); 

goto top; 

} 



★★★★★★★★★★★*★★ SAMPLE RUN ★★*★★***★★★★★★*★★★* 

% values 

Type A C anytime to quit. 

Enter ain... 104 
Enter bin. . . 22 
Enter hin. . . 0 

ain=104 bin=22 hin=0 hout=39 lout=l result=9985 

Enter ain. . . 104 
Enter bin . . . 1 
Enter hin. . . 39 

ain=104 bin=l hin=39 hout=8 lout=240 result=2288 

Enter ain... ~C% 

% 



Figure 3.14 Values: Progran to Compute Multip8c Output. 
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file was checked for design rule errors by running it 
through the Stanford "drc" program via "ell" [Ref. Is pp. 
147-151] to reformat the file. The command sequence is: 

% cif multip8c.cif -gng 
* ell multip6c.co 
% drc multip8c.sco 

There are two problems, however, with using drc on this 
design. One is that the design rules used by MacPitts are 
not the standard Mead Conway rules [Ref. 2: pp. 47-51], but 
are a combination of these and the MOSIS design rules which 
include hurried contacts [Ref. 2: page 133]. Burried 

contacts are not recognized by "drc. " The other problem is 
that the "cif" program does not correctly read .cif files 
which use the 200 centimicron lambda dimension — round-off 
error is introduced. Therefore, the design rule check can 
only be performed on multip8c5, not on multip8c4 which is 
the version to be fabricated. 

The results cf this drc run, thus caveated, produced 
2 types cf stated errors, both of which are spurious. One 
is a "pcly to diffusion contact separation" error in the 
controller where maepitts abuts two contacts, one to pcly 
and cne to diffusion, but both through the same overlying 
metal conductor. The intent of the design rule checker, in 
this instance, is tc forewarn of the possibility of a short 
circuit; a short circuit is in fact the desired result of 
this unorthodox structure. The other stated error is an 
"implant surround" error in the register clock. This struc- 
ture is flagged because the lurried contact to that layer 
was ignored by drc. Eased on this non-ideal but only avail- 
able check of design rules, it was concluded that the 
multip8c5. cif file does define processable mask layers. It 
is assumed that the multip8c4 .cif file is also processable 
because it differs only in scale from multip8c5. cif , except 
for the pads, whose design is supposedly from a standard 
library supplied by MCSIS. 
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3- Nod e Ex t ract ion and Event Simulation 

The node extraction program "extract" which is part 
of the Stanford VLSI design tools does not accurately inter- 
pret .cif files with lambda egual to 200 centimicrons. 
Fortunately, the "mextra" program, written at Berkeley, 2 can 
accommodate both 200 and 250 centimicron cif files. 

To obtain an extraction and simulation of the 
multip8c design in 4 micron size, the corresponding cif 
file, multip8c4 .cif was converted to the ". ca" format used 
by the Berkeley "caesar" layout editor. Then labels for all 
the pads were added to the design using caesar so that 
mextra would know which nodes are to be accessible for moni- 
toring. Before exiting caesar, a new cif file, mul8c.cif, 
is written using the caesar command 

: cif -p mul8c. 

The node extraction is made by issuing the command 

% mextra mul8c. 

The result of the mextra run is a .sim file suitable for 
input to the "esim" event simulator [Bef. Is pp. 152-155], 
and also a .log file (figure 3. 15) in which is contained 
summary statistics of the extraction. 




Figure 3.15 lextra -log File for Mul8c.cif.. 



2 See Appendix C. 
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The simulation, using extracts of mul8c.cif was set 
up to perform the same tests used in the macpitts inter- 
preter session of multip8c. To do this, two macro files 
were created. One defines the three phase clock sequence, 
declares which nodes to watch, and sets the values of the 
inputs to those which simulate the problem 104x22. The 
second macro file, which was designed to be read in at the 
midpoint of the simulation, redefines the input values to 
make the chip perform like the second multip8c unit in the 
pipeline. These files are both listed in figure 3.16. 



% cat mul8c. macro 

K phia 11011 phib 10000 phic 10001 
W ain ain7 ain6 ain5 ain4 ain3 ain2 ainl ainO 

W bin bin7 bin6 bin5 bin4 bin3 bin2 binl binO 

W hin hin7 hin6 hin5 hin4 hin4 hin2 hinl hinO 

W hout hout7 hout6 hout5 hout4 hout3 hout2 houtl houtO 

W lout lout7 lout6 lout5 lout4 lout3 lout2 loutl loutO 

W aout aout7 aout6 aout5 aout4 aout3 aout2 aoutl aoutO 

W clock phia phib phic 
h ain6 ain5 ain3 bin4 bin2 binl 

1 ain7 ain4 ain2 ainl ainO bin7 bin6 bin5 bin3 binO 

1 hin7 hin6 hin5 hin4 hin3 hin2 hinl hinO 

% cat mul8c.macro2 

h hin5 hin2 hinl hinO binO 

1 bin7 bin6 bin5 bin4 bin3 bin2 binl 



Figure 3.16 Two Macro Driver Files for Event Simulation. 



The record of a simulation run using these files is 
contained in Appendix D. It shows the same correct results 
obtained with the macpitts functional interpreter. Note 
however that when the "I" command is given to esim, all the 
circuit nodes are initialized to some value over which the 
user has no control. Therefore, the values of the output 
ports are net meaningful until the fourth clock cycle, even 
though they are defined during initialization. 
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The event simulation result is encouraging evidence 
that macpitts can produce, in at least one instance, a masJc- 
level description that correctly reflects a circuit design 
with algorithmic behavior specified by the designer. 
Further validation evidence was obtained by performing an 
extraction and event simulation on multip8c5. cif , the 5 
micron version of the multiplier. This extraction could be 
done using the Stanford program; the result was the same as 
for meitra. The event simulation produced a correct result 
for the same exercise. It was concluded, therefore, that 
the design was ready for fabrication. 

F. SUHHABY OF ACTIVITIES IN THE MACPITTS DESIGN CYCLE 

A recommended pattern of steps to follow in the MacPitts 
design cycle can be summarized by presenting the sequence of 
UNIX commands issued by the designer for a typical case. 
This sequence divides into two paths after the cif file is 
created, depending cn whether 4 micron or 5 micron minimum 
feature size is selected. For the 4 micron option the 
caesar/mextra tools must be used. For the 5 micron option 
it is more convenient to use extract, a program which recog- 
nizes node labels furnished by MacPitts with the cif user 
extension 0. 

As a starting point, it is assumed that the designer 
already has formulated a precise idea of what behavior the 
chip is to exhibit, and has translated the behavioral speci- 
fication into MacPitts language. 

The 5 micron path, using the multip8c.mac source file as 
an example, is as follows: 

% vi multip8c.mac 

(Create the source file.) 

% mac pitts multip8c int herald 



68 



(Run the interpreter to debug the source file and verify the 
functional correctness of the specification. Save states as 
desired using the "p" interpreter command, renaming files 
from a second terminal keyboard to prevent overwriting. 
Quit the interpreter.) 

% script 

(Start a recording session for the terminal screen.) 

% macpitts multip8c 5u herald 

(Generate 5 micron aultip8.cif and complete design statis- 
tics . ) 



5? ev multip8c. cif multip8c5. cif 
(Rename cif file to proclaim that it is a 5 micron design.) 

% ctrl-D 

(Stop the recording session.) 

$ print typescript 

(Get hardcopy of compiler statistics and heralds.) 

% cif multip8c5.cif -gng 
% ell multip8c5.co 
% drc mult ip8c5. sco 

(Obtain design rules check.) 

% extract multip8c5 

(Obtain a node extract.) 

% vi multip8c5. sym 

(Change spelling of 7DD and ground node labels to Vdd and 
GND , respectively.) 

% sim multipSc5 



69 



(Obtain the multip8c5.sim file.) 

% vi multipSc.macrcl 

(Create cne or more testing sequence files for the event 
simulator. See the "esim" section of Appendix C for 
details. ) 

% script 

% esim multip8c5. sim multip8c. macro 1 
(Perform event simulation of chip.) 

% ctrl-D 

% print typescript 
$ vi multip8c5. cif 

("Comment out" the user extension 0 lines at the beginning 
of this file by enclosing them all in one set of parentheses 
followed by a semicolon. See the "cifplot" section of 
Appendix C for details.) 

% stipple multip8c 5. cif (Obtain stipple plot on the 
Versatec plotter.) 

The 4 micron path, using the same example, contains 
exactly the same steps through the interpreter run, then 
continues as follows: 

% script 

% macpitts multip8c 4u herald 
(Generate 4 micron multip8c.cif and complete statistics.) 

% mv multip8c.cif multip8c4. cif 
(Bename cif file to proclaim that it is a 4 micron design.) 

% ctrl-D 

?? print typescript 

70 



% cif2ca multip8c4 .cif 

(Convert cif to caesai format. Benign warnings are issued 
when user extension 0 lines are encountered.) 

% mv project. ca multip8c4. ca 

(Give the top level caesar file a suitable name.) 

% caesar multip8c4 

(Use caesar to affix labels to each bonding pad, then output 
a new cif file using : cif -p cmul8c4. See the "caesar" 

section of Appendix C for details. Quit ceasar.) 

% mextra cmul8c4 

(Obtain a node extraction.) 

% vi multipSc. macr ol 

(Testing sequence file(s) is/are identical to the 5 micron 
case. ) 

% script 

% esim cmul8c4.sim multip8c. macro 1 
(Perform event simulation of chip.) 

% ctrl-D 

% print typescript 
% stipple cmul8c4.cif 

(Obtain stipple plot on Versatec. There is no need to worry 
about user extension 0 if the cif file was created by 
caesar. ) 
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IV, MACP I ITS PE RFORMAN CE 



A. IAYCDT EBROBS ANI INEFFICIENCIES 
1 • Ine ffi ci enc ies 

Appendix E contains photographs of an AED 767 color 
graphics terminal screen displaying the MacPitts chip 
layouts for each of the six multipliers discussed. The 
presentations were generated by the caesar VLSI circuit 
editor [Bef. 6]. Examination of these layouts, aided by the 
zoom-in feature of caesar, prompts several observations 
about MacPitts* performance. 

In any VLSI circuit layout a primary goal is to 
cover the available silicon area as densely as possible with 
circuitry. A variable, but generally small amount of the 
silicon area within the bounding box of MacPitts layouts is 
covered with circuitry. This is due in part to the rigidity 
of the target architecture — requiring the layout of data 
path organelles in a strictly linear fashion. The most 
serious waste of space in the examples explored, however, is 
caused by the inability of MacPitts to install bonding pads 
on all four sides of the chip. The left side is never 
available for this purpose due to certain algorithmic 
simplifications made by the authors of MacPitts [Bef. 16: 
p. 13]. A three-sided arrangement of pads stretches the 
outline dimensions, particularly in designs which specify a 
large number of external connections. All of the parti- 
tioned multiplier algorithms presented in the previous 
chapter — multip8a, multip8b, and multip8c — are in this 

category. 

Cne may consider the possibility of filling the 
large void above the useful circuitry in multip8c4, for 
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example, with another identical instantiation of the 
multip8c4 layout, minus the pads, and thereby produce a 
complete 8 tit multiplier on one chip. Eight pads for the 

hin pert could then also be eliminated. The cell movement 
and yank/put commands of caesar would make this operation 
possible with a minimum of drudgery. But the interconnec- 
tions between the 2 instantiations of the multip8c4 modules 
would still require tedious manual layout, and would be very 
subject to human errer. Such hand crafting, minus the 
interconnection modifications, was, in fact, attempted. 
Appendix E contains a photograph displaying the results of 
this effort, named multip8c4d to denote "double.” It 
clearly demonstrates that the synergistic use of MacPitts 
with caesar is feasible. 

To pursue the manual editing approach very far would 
be to abandon the basic concept of silicon compilation as 
defined from the outset. Nevertheless, editing is required 
if one is to obtain efficient use of silicon resources. The 
appreciation of silicon compilers like MacPitts still awaits 
a future in which tc perform such manual editing is more 
costly (in custom designs intended only for small volume 
production) than the silicon area wasted in a suboptimal 
layout. One can predict that that future will arrive, just 
as it did when the cost of memory hardware dropped thus 
solving an analogous problem: whether to waste memory but 

write clear programs, or conserve memory fully at the cost 
of monumental programming effort. 

A lack of compactness detracts from more than 
economy cf production, however. There are penalties in 
circuit operating speed as well. A closer look at the 
details of MacPitts layouts reveals inefficiencies which 
directly affect circuit performance. In general, the length 
of metal and polysilicon interconnections is much longer 
than the minimum an experienced human layout artist would be 
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expected to produce, even when both are limited to using 
right-angle (Manhattan) layout rules. For example, all of 
the output data bits generated at the far right side of the 
data path must be routed back to the left along the entire 
length of the data path, then up (or down) , over to the 
right again for the entire length of the data path, and 
finally down (or up) again to reach the bonding pads. In 
the multip8c4 layout, MacPitts uses wire runs of up to 18 mm 
to route data bits from their sources to their bonding pads 
which, in seme cases are less than 1 mm direct distance from 
the source. The problem lies in the inability of MacFitts 
to jump over the metal power/ground bus frame in making 
connections from the data path to bonding pads. This 
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Figure 4. 1 Data Path Ouput Rooting. 



problem is illustrated in figure 4.1. The experienced user 
can help equalize interconnection lengths somewhat by 
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assigning output ports only to the lowest and highest 
numbered pins. 

MacPitts, therefore, requires that the user provide 
a functional specification which is enlightened by knowledge 
of the layout limitations if optimum performance is to be 
obtained. This is an area for improvement in pursuit of the 
silicon compiler ideal. 

Another layout problem is more difficult to deal 
with: the excessive length of wiring between the control 

unit and the data path. This could be improved by centering 
the control unit under the data path, which would require 
changing the Macpitts source code in some undetermined way. 
As currently written, MacPitts always begins the control 
unit at the left margin. 

There are also many instances of dead-ended wires in 
MacPitts layouts. These "roads to nowhere" occur when 
MacPitts extends runs beyond the last point of interconnec- 
tion. They occur most frequently on the organelles, not all 
of whose capabilities may be used by the behavioral specifi- 
cation in a given instance. This appears to be a result of 
an attempt to use the same organelle for as many different 
applications as possible, apparently to control the size of 
the library. Ont rimmed wires of this variety certainly add 
to inter-node capacitance, although not to the extent that 
ineff icient routing dees. Nevertheless, they surely reduce 
the operating speed of the circuit, and make operation 
noisier and perhaps less reliable at high frequencies. 

2. Errors 

In addition to the layout inefficiencies described, 
there is ancther problem with Macpitts layouts. At least 
one input file has been known to produce a layout containing 
a fatal error. Kelly [Bef. 19] attempted to use MacPitts to 
produce a butterfly switching element chip. His design 
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(called kchip2) has a much simpler data path than the 
multip8c pipeline multiplier, but it has a larger control 
unit. It also includes some finite state machine sequencer 
units which serve tie independent processes he uses in the 
design. These are laid out to the right of the data path. 
The MacPitts designed layout of this circuit places a direct 
short circuit across the 3 clock bus lines. A picture of 
the portion of the chip where the error occurs is included 
in Appendix E. The problem arises because the clock bus 
contains "vias" where it must be extended from the data path 
to horizontally adjacent elements in the design. These 

"vias" allow the metal bus lines to cross vertical metal 
frame power or ground lines via a brief transition to the 
polysilicon layer, then back to the metal layer. MacPitts, 
however, apparently does not check for the presence of any 
intersecting vertical poly silicon runs to the control unit 
which may be placed at the same horizontal coordinate as the 
clock bus vias. None of the multip8 series of designs has 
any ccntrol lines entering the extreme right end of the data 
path. Therefore, the vias are safe, and the problem dees 
not occur. It is interesting to note, however, that 

MacPitts still extends the clock bus well to the right 
beyond the point of last use , and includes a dead-end set 
of vias to jump over the data path frame, even though there 
is nc need for that extension in the multip8 family of 
designs. It may be concluded from these observations that 
this problem is latent in all MacPitts designs, and one 
would do well to examine the control unit wiring in the 
vicinity of the clock bus at the right end of all frames. 
Caesar can be used, if necessary, to adjust the local wiring 
slightly to route the offending control line away from the 
clock vias. 
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E. CBGANE1LES VS. STANDABD CEILS 



This section briefly examines some comparative aspects 
of the Stanford standard cell approach used by Newkirk and 
Mathews [Eef. 20] and the organelles used in MacPitts. 

Beth standard cells and organelles are laid out as bit 
slices. It was hoped that there would be a one-to-one func- 
tional correspondence between at least some of the cata- 
logued standard cells and the organelles which could ferm a 
basis for comparison. Unfortunately, there is very little 
functicnal correspondence, let alone structural correspon- 
dence, between the two. The standard cells contain only 

dynamic storage elements, and use a 2 phase clock. The 
MacPitts organelles use a 3 phase clock, and the only memory 
elements available are static master-slave flip-flop regis- 
ters. The standard cells are designed for matched pitch. 
That is, they can be directly abutted, in many cases, to 
form full length words and arrays. Organelles, on the ether 
hand, generally require some margin around them for inter- 
connections (called "river routing") which apparently must 
be specifically tailored for each instantiation of the 
organelle. 

It was hoped that at least the MacPitts adder organelle, 
which is simply a standard asynchronous full adder made 
entirely from NOE gates, could be compared with something 
from the standard cell library. The most similar standard 
cell in the catalogue is an adder/subtractor [Bef. 20: 
p.10] # which is based on the 0M2 arithmetic logic unit 
[Bef. 2: pp. 145-181]. This cell is much more flexible, 

yet also more specialized, than the MacPitts adder. It is 
capable of a full range of boolean operations, not just 
addition, as determined by the values on two 4 bit control 
port lines which are threaded through the cell. It also 
differs from the organelle in that its operation is clocked. 
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Although a comparison based on size hardly seems meaningful 
for these two dissimilar units, it is noted that the orga- 
nelle measures 250 x 40 lambda units using measurements 
taken from actual layout plots. The standard cell adder 
measures 211 x 32 lambda units as specified in [Ref. 20: p. 

11 ]. 

The HacPitts static register organelle has no functional 
parallel in the standard cell library for the reasons 
mentioned above. It measures 64 x 30 lambda units, 
excluding the clock buffer unit which contains a load enable 
line affecting all the bits in the same register. The stan- 
dard cell dynamic shift register bit measures 88 x 24 lambda 
units, and contains a selector input line for each bit of 
the register built from these cells. 

C. SOFTWARE INCOMPATIBILITIES 

The authors of CacPitts have extended the CIF language 
to make "0" at the beginning of a line indicate that the 
rest of the line contains the coordinates of a node, the 
mask layer to which it applies, and a label name for that 
node. This is a useful feature with the Stanford node 
extraction programs which recognize this label device and 
use it automatically to make the node accessible to simula- 
tion programs simply by calling its name. This extension of 
CIF is unknown to the Berkeley VLSI tools. The latter use 
another CIF extension — "94" — to flag node labels. 
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V. cc NCI OS ION 



A. SOBBABI 

This thesis has described silicon compilers, and demon- 
strated how the MacPitts silicon compiler can be employed to 
design a digital pipelined multiplier using a partitioning 
concep t. 

Shortcoming s of this silicon compiler have been found 
which make the results produced by it inferior in some ways 
to those produced by practiced designers. These shortcom- 
ings may be outweighed, for some applications, by the reduc- 
tion in design time. The functional correctness of the 
MacPitts multiplier design has been demonstrated to the 
extent allowed by available simulation tools. Other 
MacPitts designs may contain errors which can be edited out 
with relative ease. 

The user of MacPitts can affect the output of the compi- 
lation process in two meaningful ways. First, it may be 
possible to write the behavioral specif ication algorithm to 
allow partitioning of the design among more than one chip. 
This possibility should be explored when layout size is a 
problem. Second, proper assignment of pins can reduce the 
worst-case length of pin pad wiring. 

Macpitts has been found compatible, except in a few 
cases, with other VLSI design tools at NPS. The caesar VISI 
editor has been particularly useful, along with the cifplot 
stipple plotter, in gaining insight into the processes 
employed by MacPitts in producing a layout. 

Although the final multiplier design was submitted for 
fabrication, unexpected delays in production schedules 
precluded testing the finished product as part of this 
research. 
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B. B ECOHMEKDATIONS 



The following recommendations should be considered: 

1. lest the multiplier chips, when they become available, 
using the event simulation macros and as many other input 
combinations as facilities allow. Single-cycle testing 
should be dene before dynamic testing is undertaken using a 
direct memory access tester. 

2. Dissect MacPitts designs with caesar, saving in separate 
cif files useful symbols to add to the local VLSI library. 
Symbols such as pad frames or entire data path units may be 
of interest. 

3. Write new organelles for the MacPitts library. A carry- 
look-ahead adder would be a useful addition. 

4. Enlarge the capabilities of MacPitts to produce designs 
in a CMOS technology. This would involve not only writing 
new data path organelles, but modifying the control unit 
architecture, as well. 

5. Obtain a capability locally to handle file transfers 
over the AEPANET/MII KET system. 



80 



APPEHCIX A 

INST AIL ATI09 OF HACPITTS OH VAX-11/780 UNDER UNIX 4,1 AND 

4.2 



A. INSTALLATION UHDEB UNIX 4. 1 OPERATING SYSTEM 

MacPitts is distributed as a collection of discrete 
source cede files written in the "C" programming language 
and in Franz Lisp Opes 38. Also included in this distribu- 
tion are two library files containing the bonding pad 
layouts in CIF, and a library file containing the standard 
organelles. The complete list of files is given in table II 
These files are located in the directory /vlsi/macpit under 
ownership of vlsi. 

All of the operations necessary to build maepitts are 
sequenced by the '• Makefile, •• a feature of the UNIX operating 
system that directs the automatic compilation and assembly 
of source programs tc produce large software modules. 

Building an executable version of the maepitts program 
requires that each source file be first compiled by the 
"liszt" lisp compiler or the »cc u compiler, as appropriate. 
The pads.l file is a lisp source which is actully generated 
by another lisp source. The latter source, padgen.l, 
filters the bonding pad CIF information contained in the 
rinout and pads20 files, and produces pads.l, a list of 
bonding pad information in the standard syntax of Franz 
Lisp. Fads.l is then H liszt , ed H (compiled) to produce the 
pads.o object file. The next step of the process fast-loads 
all of the compiled object files, linking them together in a 
single lisp " environment. ” Finally, the default settings 
for all the maepitts options invoked at run time are over- 
layed. It is this linked lisp environment, with the 
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TABLE II 

HacPitts Source Files 



Makefile - a makefile used to build 

the complete MacPitts system 

15,1 - layout language used by macpitts 

to generate CIF 

- next 13 files are the lisp 
source code for MacPitts 



control .1 
data-path. 1 
def str ucts . 1 
extract .1 
flags. 1 
frame. 1 
front-page. 1 
general.! 
interpret. 1 
order. 1 
pads.l 
prepass. 1 - 



has built-in organelles 



layout of obj file starts here 



- interactive interpreter 

created during "make macpitts" 
execution starts here 



padgen.l - makes pads.l from next 2 files 
rinout - Stanford Cell Library pads 
pad2Cb - MOSIS 2-0 micron pads 

library - standard macro, function, test, 

- and organelle library 

organelles. 1 - compiled portion of organelle library 



linccln.l - the Iincoln Laboratory lisp environment 
c-routines.c - interfaces to operating system 

macpitts - dumped MacPitts environment 



defaults set, which is finally dumped as the binary execu- 
table module: macpitts. To repeat: this entire process is 

performed automatically by the Makefile. 

Because this dumped lisp environment embodies all the 
built in functions of Franz Lisp, as well as the functions 
of macpitts, it contains a very large number of lisp func- 
tions. To accommodate all these functions, the Franz Lisp 
compiler .must be done over with new values for the parame- 
ters MAXFNS and TBEHTS which set the maximum number of 
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functions and function table entries allowable. Also, the 
padgen.l file uses the "untyi" function of the Franz Lisp 
Opus 38 fast loader which permits insertion of a single 
character in the input buffer string. The "untyi” is not a 
part of the Franz lisp Opus 36 source supplied with UNIX 
h. 1. Therefore, when Franz Lisp is remade with the new 
MAXFNS and IRE NTS values, the "untyi" function must be added 
to the fast loader source code. The steps to accomplish a 
remake of Franz Lisp are as follows: 



• In the file /usr/src/c md/lisp/fr anz/sysa t. c add the 
foilwing line to the group of MK declarations: 
KK^untyi*, Luntyi, lambda); 

• In the file /usr/src/c md/lisp/franz/h/lf uncs . h add the 
following line to the group of lispval declarations: 
lispval Luntyi (); 

• in the file /usr/src/c md/lisp/fr anz/lam6.c 
append the following code segment: 

lispval 
Luntyi 0 
{ 

lispval port,ch; 
port * nil; 
switch (np-lbct) { 
case 2: port = lbot[ 1].val 

case 1: ch = lfcct[ 0 ]. val; 
break ; 
default: 

argerr (* untyi * ) ; 

} 

if (TYPE (ch) l- INT } 

errorh (Vermisc, "untyi: expects fixnum character", 
nil. False ,) ,ch) ; 
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} 

ungetc ( (int) ch->i, okport (port , okport (Vpiport->a. clb , 
stdin) ) ) ; 
return (ch) ; 

} 

• In the file /usr/src/cmd/lisp/franz/nf asl. c 
change the value cf HA XFNS to 10000. 

• In the file /usr/src/c md/lisp. fr anz/h/structs. h 
change the value cf TRENTS to 1024. 

• Do a "make all" from the directory . /usr/src/cmd/lisp. 

Franz Lisp is non ready to compile MacPitts. The next step 
is to correct and modify the source code for Macpitts itself. 

• In the file /vlsi/macp it/c-r ou tines. c add these 
lines at the beginning: 

# define VPRINT 0100 
♦define VPIOT 0200 
#def ine VPRINTFIOT 0400 

♦ define VGETST ATE ((»v , «8)|0 

♦ define VSETSTATE ((»v»«8)|1 

• In the same file add the following lines after line 188: 

static int plotmd[ ] = VFLOT,0, 0 ; 
static int prtmd[ ] = VPRINT, 0,0 ; 

• In the same file change line 199 to: 

ioctl (plotter, VSETSTATE,plotmd) ; 

• In the same file change line 207 to: 

ioctl (plotter, VSETSTATE, prtmd) ; 

• In the file /vlsi/macpit/Makef ile change line 5 to: 

MacPitts = /vlsi/macp it/bin/macpitts 

• In the same file change line 83 to: 

(lead ’interpret. 1) \ 
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• Id the same file change line 84 to: 

(setg macpitts-direct ory * /vlsi/macpit) \ 

• In the same file change line 87 to: 

(setg option list * (opt-d opt-c stat obj cif nologo) )\ 

• In the same file change line 94 to; 

mv macpitts $(HacPitts) 

• In the file /vlsi/macpit/interpret. 1 change line 18 to 

(setg library (get-library )) 

• In the file /vlsi/macpit/lincoln. 1 change line 1093 to 

(cfasl « | /vlsi/macpit/c- routines. o -lcurses -ltermcapl 

After making these changes, macpitts is ready to "make." 
Type ’’make macpitts.” All the files will be compiled, 
linked, loaded, and then dumped as a complete macpitts lisp 
environment. This takes about 45 minutes on a lightly 

loaded system. Next type "make install.” This command 
simply moves the dumped executable module into the directory 
/vlsi/macpit/bin. Now type "make clean” to remove all the 

lisp object files that are no longer needed. The size of 
the macpitts executable module is 1384704 bytes. Finally, 
any user of macpitts should add the directory /vlsi/macpit/ 
bin to the path list in the .login file in his heme 
directory. 

B. I BST All ATI OH UHDEE UHIX 4.2 OPERATING SYSTEM 

The macpitts generated on a UNIX 4. 1 system will net run 
under UNIX 4.2. This is because the system calls are 
different. The version of Franz Lisp supplied with ONIX 4.2 
is OPOS 38, which already includes the "untyi” function. 
Therefore it is net necessary to modify the sysat.c, 
lfuncs.h, or lam6.c files. It is necessary, however, to 
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increase the MAXFNS and TRENTS values just as in the case of 
a UNIX 4-1 installation. For 4.2 these parameters are found 
in the files /usr/src/ucb/lisp/f ranz/fasl.c and /usr/src/ 
ucb/lisp/f ranz/h/structs. h, respectively. After making 
these two changes, change directories to /usr/src/ucb/lisp, 
enter super-user, and issue the command "lispconf." This 
starts up an interactive program which allows you to specify 
the type of machine ce which Franz Lisp is being installed. 
The answers to the guestions posed by this script will be 
obvious if you are using a VAX computer running UNIX 4.2. 
Next issue "make fast" from the same directory and the lisp 
system will be generated. This step takes about 2 hours on 
a lightly leaded machine. After this is done, issue "make 
install" to move the files into the standard system directo- 
ries . 

The 4.2 operating system also contains another tug that 
will prevent the maepitts interpreter from running. In the 
file /usr/src/usr . lit/libterm/tputs. c change OSPEED to 
TOSPEED everywhere it occurs. Then recompile tputs.c This 
is tc avoid multiple definition of OSPEED in this file and 
in another file, /usr/sre/u sr. lib/libcurses/cr_tty . c. 

The modifications to the MacPitts source code itself are 
the same as those reguired for a UNIX 4. 1 installation, with 
the following exception and addition: 

• In the file /vlsi/macpit/Hakefile it is not necessary 
to change line 83. This line should remain: 

(fasl ’interpret) 

• Cpus 38 of Franz Lisp, unlike Opus 36, complains if 
parameters declared in a functional definition are 
not used in the definition itself. The MacPitts 
source code contains an instance of this malpractice. 
Therefore, in the file /vlsi/macpit/f rame. 1 

change line 1338 to: 
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(lambda (pad) 



The process of "make macpitts" is done the same as for 
UNIX 4.1, but the results are somewhat different. Franz 
lisp issues warnings during compilation whenever an expres- 
sion is encountered which does not have the proper number of 
parameters immediately available. These warnings cccur 
frequently when macpitts is made under UNIX 4.2. This 
happens because the macpitts source code is contained in 
many separate files, each of which may have external refer- 
ences that remain unresolved until the object modules are 
all leaded and linked together. These warnings have no 
effect on the quality of macpitts produced, but their 
delivery does consume epu time. As a result, it takes 
approximately 90 minutes to "make macpitts" under UNIX 4.1. 
The final Macpitts executable is 1567888 bytes long in Opus 
38 on 4.2. Finally, remember to add the /vlsi/macpit/tin 
directory to the path list in the .login file in your heme 
directory. 
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APPEN DIX B 

INSINUATION OF THE CAESAB ¥LSI EDITOR ONDER UNIX 4.1 AND 

4.2 



A. INST All ATI ON UNDEB UNIX 4. 1 

The caesar VLSI circuit editor is one of many programs 
contained in the distribution of 1983 VLSI C.A.D. tools from 

0. C. Berkeley. Ihe distribution tape is loaded, in its 
entirety, in the directory /vlsi/berk83 under ownership of 
vlsi. Before installing the tools, perform the following: 

1. Have the system programmers create a new user, 
"sleeper,” with password "caesar," and home directory 
/vlsi/berk83/bin. Create a ".login" file in /vlsi/ 
berk83/bin which consists of only the following two 
lines : 

sleeper 

logout 

This step allows the use of a graphics tablet to posi- 
tion the cursor in caesar, an important facility. 

2. Have the system programmer create another new user, 
"cad" with the password close-held, and home directory 
/vlsi/berk83 . This step resolves the many references 
to ""cad" which are scatered throughout the distribu- 
tion tape. 

3. In the file /vlsi/berk83/man/tmac. anc replace every 
occurrence of the string ~cad with the string /vlsi/ 
berk83. 

4. Edit the file /vlsi/berk83/lib/displays tc contain 
only the following one line: 

/dev/tty22 /dev/tty20 std AED767 
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5 . 



Edit the file /vlsi/berk83/src/caesar/conf ig. c to 

replace every occurrence of the string cad with the 
string /vlsi/berk83. 

6. in the file /vlsi/b erk83/src/caesar/main . c find the 
single ’’return" statement in the procedure 
•’OnCcmmand." Just before that statement, add a line 
cortaining the statement ’’GrFlush () ; ". 

7. In the file /vlsi/ber k83/src/makewhatis.csh remove the 
string "man4" from line 8. 

New proceed with the installation by issuing the 
following commands. Allow each command' to run to completion 
before issuing the next. Completion is indicated by the 
return of the system prompt, "5f. " 
cd /vlsi/berk83/src/caesar 
make 

mv caesar /vlsi/berk8 3/bin/caesar 
rm *. o 
cd . . 

src/makewhatis.csh 

This completes the installation of caesar, mextra, cadman, 
and cif2ca. There are other programs on this distribution 
for which the foregoing procedure should have also been 
sufficient to achieve a satisfactory installation, but these 
remain untested. 

Finally, any user of these tools should add the direc- 
tory /vlsi/berk83/bin to the path list in the .login file of 
his heme directory. 



B. IBS TALI ATI ON DNDEB THE UNIX 4.2 OPERATING SYSTEM 

The Unix 4.2 operating system uses timing and interrupt 
calls which differ significantly from those used by Unix 
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4.1. Therefore, because caesar makes extensive use of these 
calls, the tool as installed for 4.1 will not run under 4.2. 
A different distribution tape has been written for the 
Berkeley 1983 design tools under UNIX 4.2. Installation of 
this distribution proceeds in the same way as the 4.1 
distribution except that step 6 is unnecessary. The bug 
that this step corrects has already been corrected on the 
4.2 distribution tape. 

It is also necessary to change a line which occurs in 
five files in the directory /vlsi/berk83/src/ caesar 
frcm #include <time. h> 
to iinclude <sys/time.h> 

The five files affected are main. c, aed4.c, omega4.c, 
ramtek4.c and vect4.c. 

Now proceed with the installation by issuing the 
following commands: 

.cd /vlsi/berk83/src/caesar 
make 

mv caesar /vis i/berk8 3/bin/caesar 
rm *. c 

cd . . 

sr c/make whatis.csh 

Finally, add the directory /vlsi/berk83/bin to the path 
list in the .login file in your home directory. 



90 



APPENDIX C 



MANUAL PAGES FOR BERKELEY DESIGN TOOLS 

Ad oDlioe operator’s manual exists for all of the VLSI 
design tools in the 1983 distribution from Berkeley. 
Information on the use of any of these can be made to appear 
on the terminal screen by issuing 

cadman <program> 

where <program> can be cadman, caesar, cif2ca, cifplot, 
esim, mextra, or any of the other programs in that distribu- 
tion. Only those pages affecting tools used in this silicon 
compiler research are reproduced in this appendix. 

Note that the cadman program is contained in the direc- 
tory 

/ vlsi /be rk 8 3/bi n 

Therefore either include this directory in the search path 
of your ".login" file or invoke cadman by the full rcoted 
command: 



/vlsi/berk83/bin/cadman <program>. 
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NAME 

cadman - run off section of UNIX manual 
SYNOPSIS 

cadman [ - ] [ -t ] [ section ] title ... 

DESCRIPTION 

Cadman is a program which prints sections of the cad manual. 
Section is an optional arabic section number, i.e. 3, which 
may be followed by a single letter classifier, i.e. lm indi- 
cating a maintenance type program in section 1. It may also 
be ''cad'', ''new'', ''junk'', or ''public''. If a section 
specifier is given cadman looks in the that section of the 
cad manual for the given ti ties . If section is omitted, cad- 
man searches all sections of the cad manual, giving prefer- 
ence to commands over subroutines in system libraries, and 
printing the first section it finds, if any. 

If the standard output is a teletype, or if the flag - is 
given, then cadman pipes its output through ssp ( 1 ) to crush 
out useless blank lines, ul_(l) to create proper underlines 
for different terminals, and through more (1) to stop after 
each page on the screen. Hit a carriage return to continue, 
a control-D to scroll 12 more lines when the output stops. 

The -t flag causes cadman to arrange for the specified sec- 
tion to be troff'ed to the Versatec. 



FILES 

“cad/doc/cadman/man?/* 

SEE ALSO 

Programmer's manual: more(l), ul(l), ssp(l), man(l) 
pos (1) 



appro 



BUGS 

The manual is supposed to be reproducible either on the pho 
totypesetter or on a typewriter. However, on a typewriter 
some information is necessarily lost. 
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NAME 

caesar - VLSI circuit editor 
SYNOPSIS 

caesar [ -n -g graphics_port -t tablet_port -p path -m 
monitor_type -d display_type ] [ file ] 

DESCRIPTION 

Caesar is an interactive system for editing VLSI circuits at 
the level of mask geometries. It uses a variety of color 
displays with a bit pad as well as a standard text terminal. 
For a complete description and tutorial introduction, see 
the user manual "Editing VLSI Circuits with Caesar" (an on- 
line copy is in “cad/doc/caesar . tblms) . 

Command line switches are: 

-n Execute in non-interactive mode. 

-g The next argument is the name of the port to use for 

communication with the graphics display. If not speci- 
fied, Caesar makes an educated guess based on the ter- 
minal from which it is being run. 

-t The next argument is the name of the port to use for 
reading information from the graphics tablet. If not 
specified, Caesar makes an educated guess (usually the 
graphics port) . 

-p The next argument is a search path to be used when 
opening files. 

-m The next argument is the type of color monitor being 

used, and is used to select the right color map for the 
monitor's phosphors. "std" works well for most moni- 
tors, "pale" is for monitors with especially pale blue 
phosphor . 

-d The next argument is the type of display controller 

being used. Among the display types currently under- 
stood are: AED512, UCB512 (the AED512 with special 

Berkeley PROMs for stippling), AED767, AED640 (an 
AED767 configured as 483x640 pixels), 0mega440, R9400, 
or Vectr.ix. 

When Caesar starts up it looks for a command file with the 
name ".caesar" in the home directory and processes it if it 
exists. Then Caesar looks for a .caesar file in the current 
directory and reads it as a command file if it exists. The 
.caesar file format is described under the long command 
source . 
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You generally 
name "sleeper 
order for the 
by typing two 
color display 
backslash is 



have to log in on the color terminal under the 
(password "caesar"). This is necessary in 
tablet to be useable. Sleeper can be killed 
control-backslashes in quick succession on the 
keyboard (on the AED displays, control- 
gotten by typing control-shif t-L. ) 



The four buttons on the graphics tablet puck are used in the 
following way: 

left (white) (#2) 

Move the box so that its fixed corner (normally lower- 
left) coincides with the crosshair position. 

right (green) (#4) 

Move the box's variable corner (normally upper-right) 
to coincide with the crosshair position. The fixed 
corner is not moved. 

top (yellow) (#1) 

Find the cell containing the crosshair whose lower-left 
corner is closest to the crosshair. Make that cell the 
current cell. If the button is depressed again without 
moving the crosshair, the parent of the current cell is 
made the current cell. 

bottom (blue) (#3) 

Paint the area of the box with the mask layers under- 
neath the crosshair. If there are no mask layers visi- 
ble underneath the crosshair, erase the area of the 
box . 



SHORT COMMANDS 

Short commands are invoked by typing a single letter on the 
keyboard. Valid commands are: 



a Yank the information underneath the box into the yank 
buffer. Only yank the mask layers present under the 
crosshair (if there are no mask layers underneath the 
crosshair, yank all mask layers and labels). 

c Unexpand current cell (display in bounding box form) . 

d Delete paint underneath the box in the mask layers 

underneath the crosshair (if there are no mask layers 
underneath the crosshair, the delete labels and all 
mask layers) . 

e Move the box up 1 lambda. 
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g Toggle grid on/off. 

1 Redisplay the information on both text and graphics 
screens . 

q Move the box left 1 lambda. 

r Move the box down 1 lambda. 

s Put back (stuff) all the information in the yank buffer 
at the current box location. Stuff only information in 
mask layers that are present underneath the crosshair 
(if there are no mask layers underneath the crosshair, 
stuff all mask layers plus labels). 

u Undo the last change to the layout. 

w Move the box right one lambda. 

x Unexpand all cells that intersect the box but don't 

contain it. 

z Zoom in so that the area underneath the box fills the 
screen . 

C Expand current cell so that its paint and children can 
be seen. 

X Expand all cells that intersect the box, recursively, 
until there are no unexpanded cells intersecting the 
box. 

Z Zoom out so that everything on current screen fills the 
area underneath the box. 

5 Move the picture so that the fixed corner of the box is 
in the center of the screen. 



6 Move the picture so that the variable corner of the box 
is in the center of the screen. 

*L Redisplay the graphics and text displays. 

. Repeat the last long command. 



LONG COMMANDS 

Long commands are invoked by typing a colon character 
The cursor will appear on the bottom line of the text 
nal. A line containing a command name and parameters 
be typed, terminated by return. Each line may consist 



(":"). 
term i- 
sho uld 
o f 



multiple commands separated by semi-colons (to use a colon 
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as part of a long command, precede it with a backslash). 

Short commands may be invoked in long command format by 

preceding the short command letter with a single quote. 

Unambiguous abbreviations for command names and parameters 

are accepted. The commands are: 

align <scale> 

Change crosshair alignment to <scale>. Crosshair posi- 
tion will be rounded off to nearest multiple of 
<scale> . 

array <xsize> <ysize> 

Make the current cell into an array with <xsize> 
instances in the x-direction and<ysize> instances in 
the y-direction. The spacing between elements is 
determined by the box x- and y-dimensions . 

array <xbot> <ybot> <xtop> <ytop> 

Make the current cell into an array, numbered from 
<xbot> to <xtop> in the x-direction and from <ybot> to 
<ytop> in the y-direction. The spacing between array 
elements is determined by the box x- and y-dimensions. 

box <keyword> <amount> 

Change the box by <amount> lambda units, according to 
<keyword>. If <keyword> is one of "left", "right", 
"up", or "down", the whole box is moved the indicated 
amount in the indicated direction. If <keyword> is one 
of "xbot", "ybot", "xtop", or "ytop", then one of the 
coordinates of the box is adjusted by the given amount. 
<amount> may be either positive or negative. 

button <number> <x> <y> 

Simulate the pressing of button <number> at the screen 
location given by <x> and <y> (in pixels). If <x> and 
<y> are omitted, the current crosshair position is 
used. 

cif -sblpx <name> <scale> 

Write out a CIF description of the layout into file 
<name> (use edit cell name by default; a ".cif" exten- 
sion is supplied by default). <scale> indicates how 
many centimicrons to use per Caesar unit (200 by 
default). The -s switch causes no silicon (paint) to 
be output to the CIF file. The -b switch causes bound- 
ing boxes to be drawn for unexpanded cells. The -1 
causes labels to be output. The -p switch causes a CIF 
point to be generated for each label. The -x switch 
causes Caesar not to automatically expand all cells 
(they are expanded by default) . 

cload <file> 
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Load the colormap from <file>. The monitor type is 
used as default extension. 

clockwise <degrees> [y] 

Rotate the current cell by the largest multiple of 90 
degrees less than or equal to <degrees>. <degrees> 
defaults to 90. If the command is followed by a "y" 
then the yank buffer is rotated instead of the current 
cell . 

colormap <layers> 

Print out the red, green, and blue intensities associ- 
ated with <layers>. 

colormap <layers> <red> <green> <blue> 

Set the intensities associated with <layers> to the 
given values. 



copycell 

Make 

that 

left 



a copy of the current cell, and 
its lower-left corner coincides 
corner of the box. 



position it so 
with the lower- 



csave <file> 

Save the current colormap in <file> (the monitor type 
is used as default extension) . 



deletecell 

Delete the current cell. 



editcell <file> 

Edit the cell hierarchy rooted at <file>. A ".ca" 
extension is supplied by default. If information in 
the current hierarchy has changed, you are given a 
chance to write it out. 

erasepaint <layers> 

For the area enclosed by the box, erase all paint in 
<layers>. If <layers> is omitted it defaults to "*1" . 



fill 



<direction> <layers> 

<direction> is one of ''left", "right", "up", or "down". 
The paint under one edge of the box (respectively, the 
right, left, bottom, or top edge) is sampled; every- 
where that the edge touches paint, the paint is 
extended in the given direction to the opposite side of 
the box. <layers> selects which layers to fill; if 
omitted then a default of "*" is used. 



flushcell 

Remove the definition of the current definition from 
main memory and reload it from the disk version. Any 
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changes to the cell since it was last written are lost, 
getcell <file> 

This command makes an instance of the cell in <file> (a 
".ca" extension is supplied by default) and positions 
that instance at the current box location. The box 
size is changed to. equal the bounding box of the cell. 

gridspacing 





The 


grid 


is modified 


so 


tha 


t its 


spacings in x and y 




equal the 


d imensions 


of 


the 


box. 


The grid is set so 




tha 


t the 


box falls o 


n g r 


id 


po ints 


• 


gr 


ipe 
















The 


mail 


program is 


r un 


so 


tha t c 


omments can be sent to 




the 


Caesa 


r maintaine 


r . 








he 


ight < 


size> 














The 


box ' s 


height is 


set 


to 


<size> 


. If <size> is pre- 



ceded by a plus sign then the fixed corner is moved to 
set the correct height; otherwise the variable corner 
is moved. <size> defaults to 2. 

identifycell <name> 

The current cell is tagged with the instance name given 
by <name>. This feature is not currently supported in 
any useful fashion. <name> may not contain any white 
space . 

label <name> <position> 

A rectangular label is placed at the box location and 
tagged with <name>. <name> may not contain any white 
space. <position> is one of "center", "left", "right", 
"top", or "bottom"; it specifies where the text is to 
be displayed relative to the rectangle. If omitted, 
<position> defaults to "top". 

lyra <ruleset> 

The program ~cad/bin/lyra is run, and is passed via 
pipe all the mask features within 3L of the box. The 
program returns labels identifying design rule viola- 
tions, and these are added to the edit cell. If 
<ruleset> is specified, it is passed to Lyra with the 
-r switch to indicate a specific ruleset. Otherwise, 
the current technology is used as the ruleset. 

macro <character> <command> 

The given long command is associated with the given 
character, such that whenever the character is typed as 
a short command then the given command is executed. 

This overrides any existing definition for the charac- 
ter. To clear a macro definition, type ":macro 
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<character> " , and to clear all macro definitions, type 
" rmacro" 

mark <markl> <mark2> 

The box is saved in the mark given by <markl>. <markl> 
must be a lower-case letter. If <mark2> is specified, 
the box is changed to coincide with <mark2>. 



movecell <keyword> 

The current cell is moved in one of two ways, selected 
by <keyword>. If <keyword> is "byposition", then the 
cell is moved so that its lower-left corner coincides 
with the lower-left corner of the box. This also hap- 
pens if no keyword is specified. If <keyword> is 
"bysize", then the cell is displaced by the size of the 
box (this means that what used to be at the fixed 
corner of the box will now be at the variable corner) . 



paint <layers> 

The area underneath the box is painted in <layers>. 



path <path> 

The string given by <path> becomes the search path used 
during file lookups. <path> consists of directory 
names separated by colons or spaces. Each name should 
end in "/". 



peek <layers> 

Display all paint underneath the box belonging to 
<layers>, even for unexpanded cells and their descen 
dants . 



popbox <mark> 

If <mark> is specified, then the box is replaced with 
the given mark. Otherwise the box stack is popped and 
the top stack element overwrites the box. 



pushbox <mark> 

The box is pushed onto the box stack. If <mark> is 
specified then it is used to overwrite the box, other- 
wise the box remains unchanged. 

put <layers> 

The yank buffer information in <layers> is copied back 
to the box location. If <layers> is omitted, it 
defaults to "*S1". 

quit If any cells have changed since they were last saved on 
disk, the user is given a chance to write them out or 
abort the command. Otherwise the program returns to 
the shell. 
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reset 

The graphics display is reinitialized and the colormap 
is reloaded. 

return 

The current subedit is left, and the containing edit is 
resumed . 



savecell <name> 

If <name> is specified then the current cell is g 
that name and written to disk under the name (a " 
extension is supplied by default) . If <file> isn 
specified then the cell is written out to the dis 
from which it was read. 



i ven 
.ca " 

' t 

k file 



scroll <direction> <amount> <units> 

The current view is moved in the indicated direction by 
the indicated amount. <direction> must be one of 
"left", "right", "up", or "down", <amount> is a 
floating-point number, and <units> is one of "screens" 
or "lambda". <units> defaults to "screens", and 
<amount> defaults to 0.5. 



search <regexp> 

Search labels and bounding boxes underneath the box for 
text matching <regexp>. See the manual entry for ed^ 
for a description of <regexp>. Push an entry onto the 
box stack for each match. Even unexpanded cells are 
searched . 



sideways [y] 

Flip the current cell sideways (i.e. about a vertical 
axis) . If the command is followed by a "y" then the 
yank buffer is flipped instead of the current cell. 

source <filename> 

The given file is re 
one long command (no 
whose last character 
lowing line. 

subedit 

Make the current cell the edit cell, and edit it in 
context. 

technology <file> 

Load technology information from <file>. A ".tech" 
extension is supplied by default. 

upsidedown [y] 

Flip the current cell upside down. If the command is 
followed by a "y" then the yank buffer is flipped 



ad, and each line is processed as 
colons are necessary) . Any line 
is backslash is joined to the fol- 
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ins 


tead of 


the current 


cel 1 . 






usag 


e < f 


i le> 














Wri 


te out 


in 


<file> the 


names of 


all 


the files contain 




ing 


cell def: 


Initions used anywhere in 


the design 




hie 


rarchy . 












v iew 


<ma 


rk> 














If 


<mar k> 


is 


specified , 


set view 


to i 


t, otherwise, 



change the view to encompass the entire edit cell. 



visiblelayers <layers> 

Set the visible layers to 
face <layers> with a plus 
remove from the currently 



include just <layers>. Pre- 
or minus sign to add to or 
visible ones. 



width <size> 

Set the box width to <size> (default is 
able corner unless width is preceded by 
fixed corner. 




Move vari- 
else move 



wri teall 

Run through interactive script to write out all cells 
that have been modified. 



yank <layers> 

Save in the yank buffer all information underneath the 
box in <layers>. <layers> defaults to "*1". 

ycell <name> 

If <name> is specified, do the equivalent of ":getcell 
<name>". Then expand current cell, yank it, delete the 
cell, and put back everything that was yanked. This 
flattens the hierarchy by one level. 



ysave <name> 

Save the yank buffer contents in a cell named <name>. A 
".ca" extension is provided by default. 



LAYERS 

nMOS mask layers are: 



p or r 

Polysilicon (red) layer, 
d or g 

Diffusion (green) layer, 
m Metal (blue) layer, 
i or y 
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Implant (yellow) layer, 
b Buried contact (brown) layer, 

c Contact cut layer, 

o Overglass hole (gray), layer. 

e Error layer: used by design rule checkers and other 

prog rams . 

CMOS P-well mask layers are (using technology cmos-pw) : 
p or r 

Polysilicon (red) layer, 
d or g 

Diffusion (green) layer, 
m Metal (blue) layer, 

c Contact cut layer. 

P or y 

P+ implant (pale yellow) layer, 
w P-well (brown stipple) layer. 

0 Overglass hole (gray) layer. 

e Error layer: used by design rule checkers and other 

programs. 

Predefined system layers are: 

* All mask layers. 

1 Label layer. 

S Subcell layer. 

C Cursor layer. 

G Grid layer. 

B Background layer. 

SYSTEM MARKS 

C The bounding box of the current cell. 

E The bounding box of the edit cell. 
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P The previous view. 

R The bounding box of the root cell. 

V The current view. 

FILES 

~cad/new/caesar , "cad/doc/caesar . tblms 



SEE ALSO 

ci f 2ca ( 1 ) 
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NAME 

cif2ca - convert CIF files to CAESAR files 
SYNOPSIS 

ci f 2ca [ —1 1 ambd a ] [ — t tech ] [ — o offset ] ciffile 
DESCRIPTION 

c i f 2ca accepts as input a CIF file and produces a CAESAR 
file for each defined symbol. Specifying the -1 lambda 
option scales the output to lambda centi-microns per lambda. 
The default scale is 200 centi-microns per lambda. The -t 
tech option causes layers from the specified technology to 
be acceptable. The default technology is nmos. For a list 
of acceptable technologies, see caesar (1). The -o offset 
option causes all CIF numbers to be incremented by offset . 
This is useful when the CIF numbers are used for Caesar file 
names, and when several CIF files with overlapping numbers 
are to be joined together in Caesar. 

Each symbol defined in the CIF file creates a CAESAR file. 

3y default, the files are named symbolm.ca ' ' , where m is 
the CIF symbol number (as modified by the -o offset ) . Sym- 
bols can also be named with a user-extension ' ' 9 ' ' command, 
giving a name to the symbol definition which encloses it. 

CIF commands which appear outside of symbol definitions are 
gathered into a symbol called, by default, ''project'', and 
are output to the CAESAR file pro ject.ca ' ' . 

SEE ALSO 

caesa r ( 1 ) 

DIAGNOSTICS 

Diagnostics from c i f 2ca are supposed to be self-explanatory. 
Each diagnostic g i ves the line number from the input file, 
an error class (informational, warning, fatal, or panic), 
the error message, and the action taken by cif 2ca , usually 
to ignore the CIF command. Informational messages usually 
refer to limitations of c i f 2ca ♦ Warning messages usually 
refer to inconsistencies in the CIF file, these will typi- 
cally result in CAESAR files which do not accurately reflect 
the input CIF file. Fatal messages refer to fatal incon- 
sistencies or errors in the CIF file. A fatal error ter- 
minates cif 2ca processing. Panic messages refer to internal 
problems with ci f 2ca ♦ If any diagnostics are produced, a 
summary of the diagnostics is produced. 

BUGS 

''Delete Definitions’’ commands are not implemented. c i f 2ca 
also has certain restrictions due to restrictions of CAESAR: 
£.£. non-manha ttan objects are not allowed. 
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Library cells are 

Some care should 
names are used fo 
unique in the fir 
same CAESAR file, 
one should avoid 
same directory. 



not automagically included. 



be taken in naming symbols, since symbol 
r CAESAR file names. Names which are not 
st 14 characters will attempt to create the 
and only the last one wins, 
trying to have two project.ca 



Similarly, 
files in the 
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NAME 

cifplot - CIF interpreter and plotter 
SYNOPSIS 

cifplot [ options ] filel.cif [ file2.cif ... ] 

DESCRIPTION 

CifDlot takes a description in Cal-Tech Intermediate Form 
(Cl? ) and produces a plot. CIF is a low-level graphics 
language suitable for describing integrated circuit layouts. 
Although CIF can be used for other graphics applications, 
for ease of discussion it will be assumed that CIF is used 
to describe integrated circuit designs. C i f plot interprets 
any legal CIF 2.0 description including symbol renaming and 
Delete Definition commands. In addition, a number of local 
extensions have been added to CIF, including text on plots 
and include files. These are discussed later. Care has 
been taken to avoid any arbitrary restrictions on the CIF 
programs that can be plotted. 

To get a plot call cifplot with the name of the CIF file to 
be plotted. If the CIF description is divided among several 
files call c i f plot with the names of all files to be used. 
Cifplot reads the CIF description from the files in the 
order that they appear on the command line. Therefore the 
CIF End command should be only in the last file since cif- 
plot ignores everything after the End command. After Team- 
ing the CIF description but before plotting, cifplot will 
print a estimate of the size of the plot and then ask if it 
should continue to produce a plot. Type y to proceed and n 
to abort. A typical run might look as follows: 

% cifplot lib.cif sorter. cif 
Window -5700 174000 -76500 168900 
Scale: 1 micron is 0.004075 inches 
The plot will be 0.610833 feet 
Do you want a plot? y 



After typing y cifplot will produce a plot on the Benson- 
Varian (11 inch Versatec) plotter. 

Cifplot recognizes several command line options. These can 
be used to change the size and scale of the plot, change 
default plot options, and to select the output device. 
Several options may be selected. A dash (-) must precede 



each option specifier. The 
that may be included on the 



following is a list of options 
command line: 



-w xmi n xmax ymin ymax 

(window) The -w options specifies the window; by 
default the window is set to be large enough to contain 
the entire plot. The windowing commands lets you plot 
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just a small section of your chip, enabling you to see 
it in better detail. Xmin , xmax , ymin , and ymax should 
be specified in CIF coordinates. 

-s float 

(scale) The -s option sets the scale of the plot. By 
default the scale is set so that the window will fill 
the whole page. Float is a floating point number 
specifying the number of inches which represents 1 
micron. A recommended size is 0.02. 

-1 layer list 

(layer) Normally all layers are plotted. The -1 option 
specifies which layers NOT to plot. The layer list 
consists of the layer names separated by commas, no 
spaces. There are some reserved names: allText, bbox, 
outline, text, pointName, and symbolName. Including 
the layer name allText in the list suppresses the plot- 
ting of text; bbox suppresses the bounding box around 
symbols. outline suppresses the thin outline that 
borders each layer. The keywords text, pointName, and 
symbolName suppress the plotting of certain text 
created by local extension commands. text eliminates 
text created by user extension 2. pointName eliminates 
text created by user extension 94. symbolName elim- 
inates text created by user extension 9. allText, 
pointName, and symbolName may be abbreviated by at, pn, 
and sn repectively. 

-c n 

(copies) Makes n copies of the plot. Works only for 
• the Varian and Versatec. Default is 1 copy. 

-d n 

(depth) This option lets you limit the amount of detail 
plotted in a hierarchically designed chip. It will 
only instanciate the plot down n levels of calls. 
Sometimes too much detail can hide important features 
in a circuit. 

-g n 

(grid) Draw a grid over the plot with spacing every n 
CIF units. 

-h (half) Plot at half normal resolution. ( Not yet imple- 
mented . ) 

-e (extensions) Accept only standard CIF. User extensions 
produce warnings. 

-I (non-Interactive) Do not ask for confirmation. Always 
plot. 
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-L (List) Produce a listing of the CIF file on standard 

output as it is parsed. Not recommended unless debug- 
ging hand-coded CIF since CIF code can be rather long. 



(approximate) Approximate a roundflash with an ri-sided 
polygon. By default n equals 8. (I.e. roundflashes 

are approximated by octagons.) If n equals 0 then out- 
put circles for roundflashes. (It is best not to use 
full circles since they significantly slow down plot- 
ting.) ( Full ci rcles not yet implemented . ) 



-b " text " 

(banner) Print the text at the top of the plot. 



-C (Comments) Treat comments as though they were spaces. 

Sometimes CIF files created at other universities will 
have several errors due to syntactically incorrect com- 
ments. (I.e. the comments may appear in the middle of 
a CIF command or the comment does not end with a semi- 
colon.) Of course, CIF files should not have any errors 
and these comment related errors must be fixed before 
transmitting the file for fabrication. But many times 
fixing these errors seems to be more trouble than it is 
worth, especially if you just want to get a plot. This 
option is useful in getting rid of many of these com- 
ment related syntax errors. 

-r (rotate) Rotate the plot 90 degrees. 

-V (Varian) Send output to the varian. (This is the 

default option. ) 



-W (Wide) Send output directly to the versatec. (Not 
ava i lable at NPS . ) 



-S (Spool) Store the output in a temporary file then dump 
the output quickly onto the Versatec. Makes nice crisp 
plots; also takes up a lot of disk space. 

-T (Terminal) Send output to the terminal. (Not yet fully 
implemented . ) 



-Gh 



-Ga (Graphics terminal) Send output to terminal using it's 
graphics capablities. -Gh indicates that the terminal 
is an HP2648. -Ga indicates that the terminal is an 
AED 512. 



-X basename 

(extractor) From the CIF file create a circuit 
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description suitable for switch level simulation. It 
creates two files: basename . s im which contains the cir- 
cuit description, and basename . node which contains the 
node numbers and their location used in the circuit 
description. 

When this option is invoked no plot is made. Therefore 
it is advisable not to use any of the other options 
that deal only with plotting. However, the window , 
layer , and approx imate options are still appropriate. 

To get a plot of the circuit with the node numbers call 
cif plot again, without the -X option, and include 
basename . nodes in the list of CIF‘ files to be plotted. 
(This file must appear in the list of files before the 
file with the CIF End command.) 



-c _n 

(copies) The -c specifies the number of copies of the 
plot you would like. This allows you to get many copies 
of a plot with no extra computation. 

-P pattern file 

(Pattern) The -P option lets you specify your own 
layers and stipple patterns. Pattern file may contain 
an arbitrary number of layer descriptors. A layer 
descriptor is the layer name in double quotes, followed 
by 8 integers. Each integer specifies 32 bits where 
ones are black and zeroes are white. Thus the 8 
integers specify a 32 by 8 bit stipple pattern. The 
integers may be in decimal, octal, or hex. Hex numbers 
start with 'Ox'; octal numbers start with 'O'. The CIF 
• syntax requires that layer names be made up of only 
uppercase letters and digits, and not longer than four 
characters. The following is example of a layer 
description for poly-silicon: 

"NP" 0x08080808 0x04040404 0x02020202 0x01010101 

0x80808080 0x40404040 0x20202020 0x10101010 

-F font file 

(Font) The -F option indicates which font you want for 
your text. The file must be in the directory 
' /usr/1 ib/vf ont ' . The default font is Roman 6 point. 
Obviously, this option is only useful if you have text 
on your plot. 

-0 f i 1 ename 

(Output) After parsing the CIF files, store an 
equivalent but easy to parse CIF description in the 
• specified file. This option removes the include and 
array commands (see next section) and replaces them 
with equivalent standard CIF statements. The resulting 
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file is suitable for transmission to other facilities 
for fabrication. 



In the definition of CIF provisions were made for local 
extensions. All extension commands begin with a number. 

Part of the purpose of these extensions is to test what 
features would be suitaDle to include as part of the stan- 
dard language. But it is important to realize that these 
extensions are not standard CIF and that many programs 
interpreting CIF do not recognize them. If you use these 
extensions it is advisable to create another CIF file using 
the -0 options described above before submitting your cir- 
cuit for fabrication. The following is a list of extensions 
recognized by cif plot . 

01 f ilename ; 

(Include) Read f rom the specified file as though it 
appeared in place of this command. Include files can 
be nested up to 6 deep. 

0A £ m n dx_ d^£ ; 

(Array) Repeat symbol £ m times with ^x_ spacing in the 
x-direction and n times with civ spacing in the y- 
direction. s_, m, and n are unsigned integers. d_x and 
dv are signed integers in CIF units. 

1 message ; 

(Print) Print out the message on standard output when 
it is read. 

2 " text " transform ; 

2C " text " transform ; 

(Text on Plot) Text is placed on the plot at the posi- 
tion specified by the transformation. The allowed 
transformations are the same as the those allowed for 
the Call command. The transformation affects only the 
point at which the beginning of the text is to appear. 
The text is always plotted horizontally, thus the mir- 
ror and rotate transformations are not really of much 
use. Normally text is placed above and to the right of 
the reference point. The 2C command centers the text 
about the reference point. 

9 name ; 

(Name svmbol) name is associated with the current sym- 
bol. 



94 name x. v; 

94 name x_ v laver ; 

(Name point) name is associated with the point (x_, v) . 
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Any mask geometry crossing this point is also associ- 
ated with name . If lave r is present then just geometry 
crossing the point on that layer Is associated with 
name ♦ For plotting this command is similar to text on 
plot. When doing circuit extraction this command is 
used to give an explicit name to a node. Name must not 
have any spaces in it, and it should not be a number. 

USE WITH MAC P I TTS CIF 

The lines starting with user extension 0, which MacPitts 
places at the beginning of every CIF file, must either be 
removed or "commented out" by enclosing them in an all- 
encompassing set of parentheses, thus:"( .... );". 

MacPitts CIF files are usually very long. It has been found 
most convenient to run MacPitts cifplots in the background 
with the non-Interactive mode selected. A convenient way to 
do this is by using the "stipple" command: 
stipple filel.cif 



FILES 

“cad/. cadre 
“/.cadre 

“cad/bin/vdump (only in 4.1 BSD UNIX) 

“cad /bin /stipple 
/usr/1 ib/vfont/R. 6 
/usr/tmp/#cif * 

ALSO SEE 

mcp(cadl), vdump(cadl), cadrc(cad5) 

A Guide to LSI Implementation by Hon and Sequin, Second Edi- 
tion (Xerox PARC, 1980) for a description of CIF. 



BUGS 

The -r is somewhat kludgy and does not work well with the 
other options. Space before semi-colons in local extensions 
can cause syntax errors. 

The -0 option produces simple cif with no scale factors in 
the DS commands. Because of this you must supply a scale 
factor to some programs, such as the -1 option to cif 2ca . 
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NAME 

esim - event driven switch level simulator 
SYNOPSIS 

esim [filel [file2 ...]] 

DESCRIPTION 

Esim is an event-driven switch level simulator for NMOS 
transistor circuits. Ss im accepts commands from the user, 
executing each command before reading the next. Commands 
come in two flavors: those which manipulate the electrical 
network, and those to direct the simulation. Commands have 
the following simple syntax: 

c argl arg2 ... argn <newline> 
where 'c' is a single letter specifying the command to be 
performed and the a ro i are arguments to that command. The 
arguments are separated by spaces (or tabs) and the command 
is terminated by a <newline>. 

To run esim type 

esim filel file2 ... 

Esim will read and execute commands, first from filel , then 
f ile2 , etc. If one of the file names is preceded by a 
then that file becomes the new output file (the default out- 
put is stdout) . For example, 
esim f.sim -f.out g.sim 
This would cause es im to read commands 
output to the default output. When _f.s 

f . out would become the new output file, 

g. sim executed. 

After all the files have been processed, and if the "q" com- 
mand has not terminated the simulation run, esim will accept 
further commands from the user, prompting for each one like 
so : 

sim> 

The user can type individual commands or direct es im to 
another file using the "9" command: * 
sim> § patchf ile.sim 

This command would cause es im to read commands from 
"patchf ile.sim" , returning to interactive input when the 
file was exhausted. 

It is common to have an initial network file prepared by a 
node extractor with perhaps a patch file or two prepared by 
hand. After reading these files into the simulator, the 
user wo u Id then interactively direct esim . Th is could be 
accomplished as follows: 

esim file.sim patch. 1 patch. 2 
After reading the files, esim would prompt for the first 
command. Or we could have typed: 

% esim file.sim 



from f. sim , sending 
im was exhausted, 
and the commands in 
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sim> @ patch. 1 
sim> @ patch. 2 

Network Manipulation Commands 



The electrical network to 
enhancement and depletion 



nodes . 
lowing 
e 
e 



can be 



be simulated is made up of 

mode transistors interconnected by 

added to the network with the fol- 



d 

d 



Compo nents 
commands : 
gate source drain 

gate source drain length width key xpos ypos area 
Adds enhancement mode transistor to network with 
the specified gate, source, and drain nodes. The 
longer form includes size and location information 
as provided by the node extractor — when making 
patches the short form is usually used, 
gate source drain 

drain length width key xpos ypos area 
except for depletion mode devices, 
cap 

the capictance between nodel and node2 
this unless either nodel or 



gate source 
Like "e" 
nodel node2 
Increase 
caD. Esim 



by 



ignores 

noae2 is GND. 

= node namel name2 name3 

Allows the user to specify synonyms for a given 
node. Used by the node extractor to relate user- 
provided node names to the node's internal name 
(usually just a number) . 

I comment . . . 

Lines beginning with vertical bar are treated as 
comments and ignored — useful for deleting pieces 
of network in node extractor output files, 
i node 

Input record — output by node extractor and not 
used by es im . 

Currently, there is no way to remove components from the 
network once they have been added. You must go back the 
input files and modify them (using the comment character) to 
exclude those components you wished removed. "N" records 
need not be included for new nodes the user wishes to patch 
into the network. 



Simulator Commands 



The user can specify which nodes are to have there values 
displayed after each simulation step: 
w nodel -node2 node3 ... 

Watch nodel and node3, stop watching node2. At 
the end of a simulation step, each watched node 
will displayed like so: 
nodel=0 node3=X . . . 

To remove a node from the watched list, preface 



113 



ESIM (CADI) 



CAD Toolbox User's Manual 



ESIM (CADI) 



its name with a '-' in a "w" command. 

W label nodel node2 ... noden 

Watch bit vector. The values of nodes nodel, . .., 
noden will displayed as a bit vector: 
label=010100 20 

where the first 0 is the value of nodel, the first 
1 the value of node2, etc. The number displayed 
to right is the value of the bit vector inter- 
preted as a binary number; this is omitted if the 
vector contains an X value. There is no way to 
unwatch a bit vector. 

Before each simulation step the user can force nodes to be 
either high (1) or low (0) inputs (an input's value cannot 
be changed by the simulator!): 
h nodel node 2 . . 

Force each node on the argument list to be a high 
input. overrides previous input commands if 
necessa ry . 

1 nodel node 2 . . . 

Like "h n except forces nodes to be a low input, 
x nodel node 2 . . . 

Removes nodes from whatever input list they happen 
to be on. The next simulation step will determine 
their correct value in the circuit. This is the 
default state of most nodes. Note that this does 
not force nodes to have an "X" value — it simply 
removes them from the input lists. 

The current value of a node can be determined in several 
ways : 

v 

View. prints the values of all watched nodes and 
nodes on the high and low input lists. 

? nodel node2 . . . 

Prints a synopsis of the named nodes including 
their current values and the state of all transis- 
tors that affect the value of these nodes. This 
is the most common way of wondering through the 
network in search of what went wrong... 

! nodel node2 . . . 

For each node in the argument list, prints a list 
of transistors controlled by that node. 

"?" and "!" allow the user to go both backwards and forwards 
through the network in search of that piece causing all the 
problems . 

The simulator is invoked with the following commands: 
s 

Simulation step. Propogates new values for the 
inputs through the network, returns when the net- 
work has settled. If things don't settle, command 
will never terminate — try the "w" and n D n com- 
mands to narrow down the problem. 
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c 

Cycle once through the clock, as define by the K 
command. 

I 

Initialize. Circuits with state are often hard to 
initialize because the initial value of each’node 
is X. To cure this problem, the I command finds 
each node whose value is charged-X and changes it 
to charged-0, then runs a simulation step. If one 
iterates the I command a couple times, this often 
leads to a stable initialized condition (indicated 
when an I command takes 0 events, i.e., the cir- 
cuit is stable) . 

Try it — if circuit does not become stable in 3 
or 4 tries, this command is probably of no use. 

Miscellaneous Commands 

D 

toggle debug switch. useful for debugging simula- 
tor and/or circuit. If debug switch is on, then 
during simulation step each time a watched node is 
encounted in some event, that fact is indicated to 
the user along with some event info. If a node 
keeps appearing in this prinout, chances are that 
its value is oscillating. Vice versa, if your 
circuit never settles (ie., it oscillates) , you 
can use the "D" and "w" commands to find the 
node(s) that are causing the problem. 

> filename 

write current state of each node into specified 
file. useful for make a break point in your simu- 
lation run. Only stores values so isn't really 
useful to "dump" a run for later use — see "<" 
command . 

< filename 

read from specified file, reinitializing the value 
of each node as directed. Note that network must 
already exist and be identical to the network used 
to create the dump file with the ">" command. 

These state saving commands are really provided so 
that complicated initializing sequences need only 
be simulated once. 

L 

invokes network processor that finds all subnets 
corresponding to simple logic gates and converts 
them into form' that allows faster simulation. 

Often it does the right thing, leading to a 25% to 
50% reduction is the time for a single step. [We 
know of one case where the transformation was not 
transparent, so caveat simulee...] 
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X . . . 

call extension command — provides for user exten- 
sions to simulator. 

q 

exit to system. 

Local Extensions 

V node vector 

Define a vector of inputs for the node. The first 
element is initially set as the input for node . 

Set the next element of the vector as the input 
after a cycle. 

R n 

Run the simulator through n cycles. If n is not 
present make the run as long as the longest vec- 
tor. All watch nodes are reported back as vec- 
tors. 

N 

Clear all previously defined input vectors. 

K nodel vectorl node2 vector2 ... nodeN vectorN 

Define the clock. Each cycle, nodes 1 through N 
must run through their respective vectors. 

SEE ALSO 

mext r a (CADI ) 
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NAME 

mextra - Manhattan Circuit Extractor 
SYNOPSIS 

mextra [-gho] [-u scale ] basename 
DESCRIPTION 

Mextra reads an intergrated circuit layout description in 
Caltech Intermediate Form (CIF) and creates a circuit 
description. From this circuit description various electi- 
cal checks can be done on your circuit. The circuit 
description is directly compatible with es im , rnoserc , and 
powes t . 

Names 

Mextra uses the CIF label construct to implement node names 
and attributes. The form of the CIF label command is as 
follows : 

94 name x [ layer ] ; 

This command attaches the label to the mask geometry on the 
specified layer crossing the point (x, y) . If no layer is 
present then any geometry crossing the point is given the 
label. Mextra does not recognize the CIF user extension "0" 
which is used by MIT and Lincoln Labs programs (eg. mac- 
pitts) to indicate node labels. 

Mextra interprets these labels as node names. These names 
are used to describe the extracted circuit. When no name is 
given to a node, a number is assigned to the node. A label 
may contain any ASCII character except space, tab, newline, 
double quote, comma, semi-colon, and parenthesis. To avoid 
conflict with extractor generated names, names should not be 
numbers or end in '#n' where n is a number. 



A problem arises when two nodes are given the same name 
although they are not connected electrically. Sometimes we 
want these nodes to have the same names, other times we 
don't. This frequently happens when a name is specified in 
a cell which is repeated many times. For instance, if we 
define a shift register cell with the input marked 'SR. in' 
then when we create an 8 bit shift register we could have 8 
nodes names 'SR. in'. If this happens it would appear as 
though all 8 of the shift register cells were shorted 
together. To resolve this the extractor recognizes three 
different types of names: local , global , and unspecified . 

Any time a local name appears on more than one node it is 
appended with a unique suffix of the form ' #n ' where n is a 
number. The numbers are assigned in scanline order and 
starting at 0. In the shift register example, the names 
would be 'SR.in#0' through 'SR.in#7'. Global names do not 
have suffixes appended to them. Thus unconnected nodes with 
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global names will appear connected after extraction. (The 
-g causes the extractor to append unique suffixes to uncon- 
nected nodes with the same global name.) Names are made 
local by ending them with a sharp sign, Names are glo- 

bal if they end with an exclamation mark, '!'. These ter- 
minating characters are not considered part of the name, 
however. Names which do not end with these characters are 
considered unspecified. Unspecified names are treated simi- 
lar to locals. Multiple occurrences are appended with 
unique suffixes. By convention, unspecified names signify 
the designer's intention that this name is a local name, but 
is connected to only one node. It is illegal to have a name 
that is declared two different types. ■ The extractor will 
complain if this is so and make the name local. 



Optionally mextra will expand local and unspecified node 
names with the path name of the symbol instances through 
which they were called. By using the -h option mextra will 
produce node names of the form: 

/ call l/ call 2/. . . / call N/ no de - name 
where cal IN is the name of the symbol instance which con- 
tains the label node - name , call N-1 is the name of the 
instance which contains call N, and so on. Named symbol 
instances take the following form in CIF: 

91 name ; C number (a_ b] ; 

Unnamed CIF calls are assigned names of the form ' #n', where 
n is a number. 

It makes no difference to the extractor if the same name is 
attached to the same node several times. However, if more 
than one name is given to a node then the extractor must 
choose which name it will use. Whenever two names are given 
to the same node the extractor will assign the name with the 
highest type priority, global being the highest, unspecified 
next, local lowest. If the names are the same type then the 
extractor takes the one with the fewest slashes ('/') ; if the 
number of slashes is equal, the shortest name is taken. 

This causes the name highest up in the symbol hierarchy to 
be taken when hierarchical names are expanded. At the end 
of the log file the extractor lists nodes with more than one 
name attached. These lines start with an equal sign and are 
readable by es im so that it will understand these aliases. 



Attributes 



In addtion to naming nodes mextra allows you to attach 
attributes to nodes. There are two types of attributes, 
node attributes , and trans isto r attributes . A node attri- 
bute is attached to a node using the CIF 94 construct, just 
the same way as a node name. The node attribute must end in 
an at-sign, '§'. More than one attribute may be attached to 
a node. Mextra does not interpret these attributes other 
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than to eliminate duplicates. For each attribute attached 
to a node there appears a line in the .sira file in the fol- 
lowing form: 

A node attribute 

Node is the node name, and attribute is the attribute 
attached to that node with the at-sign removed. 



Transistor attributes can be attached to the gate, source 
or drain of a transistor. Transistor attributes must end 
a dollar sign, '$'. To attach an attribute to a transist 
gate the label must be placed inside the transistor gate 
region. To attach an attribute to a source or drain of a 
transistor the label must be placed on'the source or drai 
edge of a transistor. Transistor attributes are recorded 
the transistor record in the .sim file. A transistor 
description has the following form: 



in 

or 



n 

in 



tyoe gate source drain 1 w x y g= a ttr ibutes 
s= attributes d= attributes 

Attributes is a comma-separated list of attributes. If no 
attribute is present for the gate, source, or drain the g=, 
s=, or d= fields may be omitted. 

Capac itance 

The .sim file also has information about capacitance in the 
circuit. The lines containing capacitance information are 
of the form: 

C nodel node2 cap - value 

cap - value is the capacitance be tweens the nodes in femto- 
farads. Capacitance values below a certain threshold are 
not reported. The default threshold is 50 femto-f ar ads . 



The extractor reports capacitance from two sources - capaci- 
tance between node and substrate, and capacitance caused by 
poly overlapping diffusion but not forming a transistor. 
Transistor capacitances are not included since most of the 
tools that work on the .sim file calculate the transistor 
capacitance from the width and length information. 



The capacitance for each layer is calculated separately. 

The reported node capacitance is the total of the layer 
capacitances of the node. The layer capacitance is calcu- 
lated by taking the area of a node on that layer and multi- 
plying it by a constant. This is added to the product of 
the perimeter and a constant. The default constants are 
given below. Area constants are in femto-farads per square 
micron. Perimeter constants are femto-farads per micron, 
layer area perimeter 

metal 0.03 0.0 

poly 0.050.0 
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diff 0.1 0.1 

poly/diff 0.4 0.0 

Poly/diffusion capacitance is calculated similar to layer 
capacitance. The area is multiplied by constant and this is 
added to the perimeter multiplied by a constant. 
Poly/diffusion capacitance is not threshold, however. 

The -o option supresses the calculation of capacitance, and 
instead, gives for each node in the circuit the area and 
perimeter of that node on the diffusion, poly, and metal 
layers. The lines containing this information look like 
this: 



N node d i f f - area dif f - perim poly - area polv - pe rim 
metal - area metal - per im 

Node is the node name. Pi f f - area through metal - per im are 
the area and perimeter of the diffusion, poly, and metal 
layers in user defined units. (In addtion the -o option 
causes transistors with only one terminal to be recorded in 
the .sim file as a transistor with source connected to 
drain . ) 



Setting Options 

By default, mextra reports locations in CIF units. A more 
convenient form of units may be specified either in the 
'.cadre' file or on the command line. The form of the com- 
mand line option is: 

units scale 

To set units on the command line use the -u option. 



The parameters used 
changed by including 
file. 



to compute node capacitance may 
the following commands in your 



be 

' .cadre' 



areatocap layer value 
perimtocap layer val ue 

value is atto-farads per square micron for area, and atto- 
farads per micron for perimeter. layer may be "poly", 
"diff", "metal", or "poly/diff". The threshold for report- 
ing capacitance may set in the '.cadre' file with the fol- 
lowing line. 

capthreshold value 

A negative value sets the threshold to infinity. 
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MEXTRA (CADI) 



CAD Toolbox User's Manual 



MEXTRA (CADI) 



Mextra knows of two technologies, NMOS and CMOS p-well. 
NMOS is assumed by default. To set the technology to CMOS 
p-well, include the following line in your '.cadre' file: 

tech emos-pw 



FILES 

“cad/1 ib/ex tname 

“cad/lib/log 

“cad/. cadre 

“/.cadre 

/us r/tmp/$mex t* 

ALSO SEE 

caesar (cadi) , kic(cadl), powest (cadi) , cadrc(cadS) 



BUGS 

Accepts manhattan simple CIF only. The length/width ratio 
for unusually shaped transistors may be inacurate. Attri- 
butes for funny transistors are not recorded. 
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APPENDIX D 

SIHOLATION BESOITS FCB HOLTIP8C MDITIPLIIEE 



The first five figures shew, in the order that they were 
produced, .int files from a MacPitts interpreter session 
using the scurce file, multip8c .mac. 

The last three figures show the terminal output produced 
by the switch level event simulation program, esim, oper- 
ating cn the node extraction file of the MacPitts layout for 
multip8c. The node extraction was performed by the mextra 
program . 



n multip8c n 

MacPitts interpreter state after initial data entry. 



( ( register 
( register 
( register 
(register 
( register 
( register 
( register 
(register 
( register 
(register 
( register 
(register 
(port ain 
(port bin 
(port hin 
(port aout 
(port hout 
(port lout 



al undef ined- integer) 
a 2 undef ined- integer ) 
a 3 undef ined- integer ) 
a 4 undef ined- integer) 
hr 1 undef ined- integer ) 

1 rl undef ined- integer) 
hr 2 undef ined- integer ) 

1 r2 undef ined- integer) 
hr 3 undefined- integer) 

1 r3 undef ined- int ege r) 
hr 4 undef ined- integer ) 
lr4 -undefined-integer) 

104 .console) 

22 console) 

0 console) 

undef ined- integer chip) 
undefined- integer chip) 
undefined-integer chip) ) 



Figure D. 1 



Hacpitts Interpreter fiesults 



"multip8c" 

MacPitts interpreter state after 1 clock cycle 



( ( register 
(register 
( register 
(register 
( register 
( register 
( register 
(register 
( register 
(register 
( register 
(register 
(port a in 
(port bin 
(port hin 
(port aout 
(port hout 
(port lout 



al 104) 

a 2 unde fined- integer ) 
a 3 unde fined- integer ) 
a 4 undef ined-intege r ) 
hr 1 0) 

1 r 1 11) 

hr 2 undef ined-intege r ) 

1 r2 undef ined-intege r) 
hr3 undefined-integer) 

1 r3 undef ined- integer ) 
hr 4 undef ined- in teg er ) 

1 r 4 undef ined-intege r ) 

104 console) 

22 console) 

0 console) 

undef ined-intege r chip) 
undefined- integer chip) 
undefined-integer chip) ) 



"multip8c" 

Macpitts interpreter state after 2 clock cycles. 



( ( register 
(register 
( register 
(register 
( register 
(register 
( register 
(register 
(register 
(register 
( register 
(register 
(port a in 
(port bin 
(port hin 
(port aout 
(port hout 
(port lout 



al 104) 
a 2 104) 

a 3 undef ined- integer ) 

a 4 undef ined-intege r ) 

hr 1 0) 

lrl 11) 

hr 2 52) 

lr2 5) 

hr 3 undef ined- integer ) 
lr3 undef ined- integer ) 
hr 4 undef ined- integer ) 
lr4 undef ined-intege r ) 

104 console) 

22 console) 

0 console) 

undef ined-intege r chip) 
undef ined- integer chip) 
undefined-integer chip) ) 



Figure D.2 MacPitts Interpreter Results, (continued) 




"multip8c" 

Macpitts interpreter state after 3 clock cycles 



( ( register 


al 


104) 




(register 


a 2 


104) 




( register 


a3 


104) 




(register 


a4 


unde f ined- integer ) 




(register 


hr 1 


0) 




(register 


lrl 


11) 




( register 


hr 2 


52) 




(register 


1 r2 


5) 




(register 


hr 3 


78) 




(reg ister 


1 r3 


2) 




( reg is te r 


hr 4 


undefined- integer ) 




(register 


1 r 4 


undefined- integer) 




(port ain 


104 


console) 




(port bin 


22 


console) 




(port hin 


0 console) 




(port aout undefined-integer chi 


P) 


(port hout undefined-integer chi 


P) 


(port lout undefined-integer chi 


P)) 





"mul tip8c " 


Macpi tts 


interpreter state after 4 clock cycles 


( ( register al 


104) 


(register a2 


104) 


(register a3 


104) 


(register a4 


104) 


(register hrl 


0) 


(register lrl 


11) 


(register hr2 


52) 


(register lr2 


5) 


(register hr3. 


78) 


(register lr3 


2) 


(register hr4 


39) 


(register lr4 


1) 


(port ain 104 


console) 


(port bin 22 


console) 


(port hin 0 console) 
(port aout 104 chip) 


(port hout 39 


chip) 


(port lout 1 


chip) ) 



I 



Figure D-3 MacPitts Interpreter Results, (Continued), 
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"mul tip8c" 

MacPitts interpreter state after 4 clock cycles and 
resetting the input ports to the values of the output ports. 
This simulates a second chip in cascade with the first. 



( ( register 
(register 
(register 
(register 
( register 
(reg ister 
( r eg ister 
(register 
(register 
(register 
(reg ister 
(register 
(port ain 
(port bin 
(port hin 
(port 
(po 
(po 



al 


104) 


a 2 


104) 


a3 


104) 


a4 


104) 


hr 1 


0) 


1 r 1 


11) 


hr 2 


52) 


1 r2 


5) 


hr 3 


78) 


lr3 


2) 


hr 4 


39) 


1 r 4 


1) 


104 


consol 


1 console) 


39 


console 


104 chip) 


39 


chip) 


1 


chip) ) 



aout 
rt ho 
r t lo 



'mul tip8c" 



Macpitts 


interpreter state after 5 clock cycles 


( ( reg ister al 


104) 


(register a2 


104) 


(register a3 


104) 


(register a4 


104) 


(register hrl 


71) 


(register lrl 


128) 


(register hr2 


52) 


(register lr2 


5) 


(register hr3 


78) 


(register lr3 


2) 


(register hr4 


39) 


(register lr4 


1) 


(port ain 104 


console) 


(port bin 1 console) 


(port hin 39 


console) 


(port aout 104 chip) 


(port hout 39 


chip) 


(port lout 1 


chip) ) 



Figure D«4 HacPitts Interpreter Besults, (Continued) 




"mul tip8c" 

Macpitts interpreter state after 6 clock cycles 



( ( register 
(register 
(register 
(register 
( register 
(register 
{ register 
( reg i ster 
( register 
( register 
( register 
(register 
(port ain 
(port bin 
(port hin 
(port aout 
(port hout 
(port lout 



al 104) 
a2 104) 
a3 104) 
a4 104) 
hr 1 71) 
lrl 128) 
hr 2 35) 
lr2 192) 
hr 3 78) 

1 r3 2) 
hr 4 39) 
lr4 1) 

104 console) 
1 console) 

39 console) 
104 chip) 

39 chip) 

1 chip) ) 



"mul tip8c" 

Macpitts interpreter state after 7 clock cycles. 



( ( register 


al 


104) 


( register 


a2 


104) 


( register 


a3 


104) 


(register 


a4 


104) 


( reg iste r 


hr 1 


71) 


(register 


lrl 


128) 


( register 


hr 2 


35) 


( register 


1 r 2 


192) 


( register 


hr 3 


17) 


(register 


1 r3 


224) 


( register 


hr4 


39) 


(register 


1 r4 


1) 


(port ain 


104 


console) 


(port bin 


1 console) 


(port hin 


39 


console) 


(port aout 


104 chip) 


(port hout 


. 39 


chip) 


(port lout 


1 


chip) ) 



Figure D-5 HacPitts Interpreter Results, (Continued). 
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"multip8c" 

Macpitts interpreter state after 8 clock cycles. 



( ( register 


al 


104) 


(register 


a2 


104) 


( register 


a3 


104) 


(register 


a4 


104) 


( register 


hr 1 


71) 


(register 


lrl 


128) 


( register 


hr 2 


35) 


(register 


1 r2 


192) 


( reg iste r 


hr 3 


17) 


(register 


1 r3 


224) 


( register 


hr 4 


8) 


(register 


1 r4 


240) 


(port ain 


104 


console) 


(port bin 


1 console) 


(port hin 


39 


console) 



(port aout 104 chip) 
(port hout 8 chip) 
(port lout 240 chip)) 



n multip8c n 

MacPitts interpreter state after 9 clock cycles. 



( ( register 


al 


104) 


(register 


a2 


104) 


( register 


a3 


104) 


(register 


a4 


104) 


( register 


hr 1 


71) 


(register 


lrl 


128) 


( register 


hr2 


35) 


(register 


1 r 2 


192) 


( register 


hr3 


17) 


(register 


1 r3 


224) 


( reg ister 


hr4 


8) 


( register 


1 r4 


240) 


(port ain 


104 


console) 


(port bin 


1 console) 


(port hin 


39 


console) 



(port aout 104 chip) 
(port hout 8 chip) 
(port lout 240 chip)) 



Figure D„6 MacPitts Interpreter Results, (Continued) - 
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% esim miil8c.sim mul8c . macro 

1612 transistors, 1393 nodes (801 pulled up) 
1612 transistors, 1398 nodes (801 pulled up) 
sim> s 

step took 605 events 
clock=XXX 



ao ut =XXXXXXXX 
lout=XXXXXXXX 
ho Ut = XXXXXXXX 



h i n=00000000 


0 




bin=00010110 


22 




a in=01 1 01000 
sim> I 


104 




initial iza tion 
sim> I 


took 


2119 steps 


initialization 
sim> s 


took 


0 steps 


step took 0 events 




clock=000 


0 




aout=llllllll 


255 




lout=llllllll 


255 




hout=llllllll 


255 




hin=00000000 


0 




bin=00010110 


22 




ain=01101000 
sim> c 


104 




clock=101 


5 




aout»llllllll 


255 




lout=01111111 


127 




hout=01111111 


127 




hin=00000000 


0 




bin=000l01l0 


22 




ain=0110l000 


104 




cycle took 1433 
sim> c 


events 


clock=l 01 


5 




aout-11111111 


255 




lout=00111111 


63 




hout=00111111 


63 




hin=oooooooo 


0 




bin=00010110 


22 




a in=011 01000 


104 





cycle took 1210 events 



Last line is repeated at top of following oage. 

I 



Figure D.7 Event Simulation Eesults. 
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cycle took 1210 
sim> c 


events 


clock=101 


5 


aout-llllllll 


255 


lout=00011111 


31 


hout=00011111 


31 


hin=00000000 


0 


bin=00010110 


22 


ain=01101000 


104 


cycle took 1231 
sim> c 


events 


clock=101 


5 


aout=01101000 


104 


lout=00000001 


1 


hout=00100111 


39 


h in=00000000 


0 


bin=00010110 


22 


a in=01101000 


104 


cycle took 1139 
sim> c 


events 


clock=101 


5 


aout=01101000 


104 


lout = 00 0 00 00 1 


1 


hout=0 0100111 


39 


hi n=00000000 


0 


bin=0001 0110 


22 


ain=011 01 000 


104 


cycle took 1052 


events 


sim> 0 mul8c .macro2 
sim> s 


step took 177 events 


clock=101 


5 


aout=01101000 


104 


lout=00000001 


1 


hout=00100111 


39 


hin=00100111 


39 


bin=00000001 


1 


ain=011 01 000 
sim> c 


104 


Last line is repeated at top of following page. 



Figure D.8 Event Simulation Results, (Continued) . 



129 




sim> c 




clock=101 


5 * 


aout=01101000 


104 


lout=00000001 


1 


hout=001 001 11 


39 


hi n=00 1 0011 1 


39 


bi n=000000 01 


1 


a in=01 1 01 000 


104 


cycle took 1164 
sim> c 


events 


clock=l 01 


5 


aout-01101000 


104 


lout=00000001 


1 


hout=001 001 11 


39 


hin=00100111 


39 


bin=00000001 


1 


a in=011 01000 


104 


cycle took 1154 
sim> c 


events 


clock=101 


5 


aout=01101000 


104 


lout=00000001 


1 


hout = 0 010 01 11 


39 


hin=001 0011 1 


39 


bin=000000 01 


1 


a in=011 01 000 


104 


cycle took 1131 
sim> c 


events 


clock = .l 01 


5 


aout=01101000 


104 


lout=ll 1 10000 


240 


hout=000 01 000 


8 


hin=00100111 


39 


bin=00000001 


1 


ain=01101000 


104 


cycle took 1123 
sim> c 


events 


clock=101 


5 


aout=011 01 000 


104 


lout=lll 1 0000 


240 


hout=00001 000 


8 


hin=00100111 


39 


bin=00000001 


1 


ain=011 01 000 


104 


cycle took 1052 
sim> q 
% 


events 



Figure D.9 Event Simulation Results, 



i 

(Continued) . 
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APPENDIX E 
LAYOUT PHOTOGRAPHS 



Exposure data: 

Display: A ED 767 Color Graphics Terminal 

Wester light Value: 7.5 to 8.5 

Camera: Pectax SIR, Tripod Mounted 

Film: Tri-X, ASA 400 

Lens-to-screen distance: 4 feet 

lens: 85 mm, fl.9 

Lens Opening: f16 

Shutter Speed: 1 second 
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Figure E. 1 



nultic (top). 



multip (hot) . 
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Figure E.2 



multip8 (top ) , 



multip8a (bot) . 
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Figure E.3 



aultip8b (top ) , 



multip8c5 (bot) . 
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Figure E. 4 *ultip8c4 (top). 



»ultip8c4d (hot) . 
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Figure E.5 



Layout Errors in Xchip2. 
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